# Jeffrey Zheng Editor

# Variant Construction from Theoretical Foundation to Applications

Variant Construction from Theoretical Foundation to Applications

Jeffrey Zheng Editor

# Variant Construction from Theoretical Foundation to Applications

Editor Jeffrey Zheng School of Software Yunnan University Kunming, Yunnan, China

ISBN 978-981-13-2281-5 ISBN 978-981-13-2282-2 (eBook) https://doi.org/10.1007/978-981-13-2282-2

Library of Congress Control Number: 2018958351

© The Editor(s) (if applicable) and The Author(s) 2019. This book is an open access publication. Open Access This book is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made.

The images or other third party material in this book are included in the book's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the book's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.

The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore

Dedicated to I-Ching—First Variant Construction Lan Z. Yin & Su M. Zheng—Mother & Father Qing S. Gao—Mentor on Parallel Sorting Algorithm & Computer Architecture Tasiyasu L. Kunii—Master on Meta Knowledge Bob Beaumont—Adviser on Optimization Ping Zhang—Wife Graduate School of USTC & UCAS—40-th Anniversary (1978–2018)

# Foreword

Dr. Jeffrey Zheng was one of the first postgraduate students supervised by Prof. Qingshi Gao (Member, Chinese Academy of Sciences) at the Institute of Computing Technology, Chinese Academy of Sciences. I have known Dr. Zheng for 40 years since then. Building upon his postgraduate work (Parallel Sorting Algorithm and 0-1 Transformation), Dr. Zheng has made significant contribution to the field of Variant Construction, ranging from theoretical foundations to various applications. His research has been published at many academic journals and conferences. For the convenience of readers, Dr. Zheng compiled his representative works of 40 years into two monographs with complementary contents. I believe that professionals in related fields will find this book both an excellent reference and a source of inspiration. Other readers will enjoy this book as an introduction to topics of Variant Construction. I am very happy to recommend this book in the form of a foreword.

Beijing, China Yunmei Dong April 2018 Professor, The Institute of Software Chinese Academy of Sciences Member, Chinese Academy of Sciences

As head of the R&D team for Lenovo Chinese Systems, I am very pleased to see the research work of former colleague Dr. Jeffrey Zheng, which began 30 years ago with the "Smoothly Enlarging Chinese Font Algorithm of 0-1 logic operations" at the Institute of Computing Technology of the Chinese Science Academy. His most recent work "Variant Construction" is summarized as a professional monograph. I expect this new measurement system to be used efficiently for advanced cryptographic tests in modern cyberspace security. I am pleased to give this foreword.

Beijing, China Guangnan Ni April 2018 Professor, The Institute of Computing Technology Chinese Academy of Sciences Member, Chinese Academy of Engineering

Dr. Jeffrey Zheng and I were in the first group of postgraduates major in Computer Architecture at the Graduate School of the Chinese Academy of Sciences 40 years ago. Professor Qingshi Gao (Member, Chinese Academy of Sciences) supervised him in particular in the areas of parallel algorithm and computer architecture.

Dr. Jeffrey Zheng is one of the few classmates who continue to works in basic research and advanced applications. It is great for Dr. Jeffrey Zheng to collect his research work in a monograph. Variant Measurement Technology could be used in the next generation of Quantum Cryptographic Communication Services.

On the occasion of the 40th anniversary of the Graduate School of Chinese Academy of Sciences, I would like to express my good wishes as a classmate for this monograph in the foreword.

Beijing, China Guojie Li April 2018 Professor, The Institute of Computing Technology Chinese Academy of Sciences Member, Chinese Academy of Engineering

# Preface

Associated with the fast development of science and technology in the twenty-first century, the modern computer and communication system in optical fiber communication supporting the global Internet shows profound influence on society and economy. As a result, globalization has become an extremely important issue in social and economic systems. The Internet and optical fiber communication systems have revolutionized the geographic and communication patterns of the world, by creating an open era of integrated global Internet connectivity. Quantum key communication technology and quantum entanglement experiments on a quantum satellite represent typical examples of China's world-leading science and technology from the perspective of frontier application research. The latest achievements of artificial intelligence, which is the lead of Alpha-Go, show the potential intelligence prospect of advanced technology based on deep learning, artificial neural networks, and knowledge-based support vector machine systems. Related achievements are very attractive, such as poetry robots, service robots, industrial robots, face recognition, gesture recognition, unmanned aerial vehicles, self-driving cars, and unmanned underwater vehicles. A list of military and civilian high-tech achievements supports daily life with rich and colorful intelligent products.

From the viewpoint of mathematics and logics, the foundation framework to design and simulate both modern computer systems and optical fiber communication networks is dependent on the 0-1 logical system and representations of multiple bit states. For integrated circuits, the theoretical basis can be traced back to the 1930s. Shannon developed the Boolean algebra to design circuits establishing switch circuit theory, Turing proposed the Turing machine, and von Neumann established a modern computer architecture. After more than 50 years of development follows Moore's Law: the observation that the number of transistors in a dense integrated circuit doubles approximately every 2 years. Optimization of very large-scale integrated circuit technology appears everywhere with evolution of magical functions.

Looking ahead, the development of advanced science and technology is subject to the limitations of basic theory and applications on foundational supports. From the perspective of basic research, how we can extend this classical level is a very interesting issue and an extremely difficult research topic.

#### Purpose of This Book

After four decades of deep exploration on 0-1 logical systems, the authors expended vector 0-1 logical systems to establish a variant logic framework in 2010. After further research and development for one decade, three theoretical components were established: variant logic, variant measurement, and variant map. At the same time, various sample applications were investigated and developed. However, because most published papers are scattered in professional journals, conference proceedings, and academic books, it is difficult for other people to obtain comprehensive information on the topic.

In addition, each article may be focused on a specific issue, and it is difficult for readers to understand the whole structure from a few papers. We are going to organize relevant papers in this book, which will be the first book on variant construction with intrinsic logical connections on the selected papers. Selected papers are composed of different parts. Based on this architecture, different readers can easily access suitable content from specific chapters.

#### The Need for a New Logic System

In modern computer and communication systems, the theory of switch circuits uses multiple bits, states, and logic operations for state automata and combinatorial logic units to design and implement complex computing and communication systems. For solving linear equations with n variables as algebraic equation, Boolean equation or differential equation, it is useful to apply a matrix associated with a set of eigenvectors. Matrices and eigenvalues are valid to provide solutions on periodic problems of special basis in periodic functions or periodic boundary conditions. However, it is difficult for periodic models to resolve exhaustive cases on the conditions of quasi-periodic, nonperiodic random, and chaotic forms. For example, modern cryptographic generation/analysis systems such as block ciphers are dependent on a Substitution–Permutation Net (SPN). This type of network connection on n bit vectors of input/output transformation includes permutation operations, where the total number of configuration functions is proportional to 2<sup>n</sup> !. From a measuring viewpoint, cryptographic sequences need to have relevant measurements, analysis models, and methods with huge complexity far beyond based on state automata and combinational logic circuits.

Modern digital computing and communication technologies are based on classical logic systems, the global Internet network with huge amounts of data models, deep learning, artificial neural networks, and knowledge–based vector support machines cannot meet internal states of exponentially increased models. Although Fourier transform and wavelet transform are the most important tools for modern spectrum analysis, there are significant limitations for this type of periodic schemes to process arbitrary random state and aperiodic types of complex functions in big data environments. It is difficult for random applications to obtain the convergence results. Quantum mechanics and modern photonic–electronic applications are confirmed the effectiveness of this frontier science.

Nobel Prize Winner G. t'Hooft proposed a cellular automaton interpretation of quantum mechanics. The research results show that there is a commonplace overlapped between classical logic and quantum mechanics, at the Planck scale in 10−<sup>43</sup> range. It is necessary to use 0-1 vectors in permutation condition to represent quantum states. From a counting viewpoint, the complexity of such structures is related to 2<sup>n</sup> !.

In classical statistics, the Ising model provides an analysis mechanism on 0-1 states. Based on the assumption of exhaustive states, an exact solution can be compared with the average field on one- and two-dimensional lattices. In general, whether there is an exact solution under the condition of random permutation distribution is an interesting topic worth further exploration. Modern experiments made good progress in advanced nanotechnology, fiber optics, laser photonics, and ultrafast laser pulse in quantum optics technology. Advanced experiments in nanotechnologies can be used to distinguish a series of the quantum block/surface/line and dot macro- to nanostructures, and relevant emission and absorption spectrum can be observed. Both wider continuous spectrum of thermal noises and narrower discrete spectrum of coherent laser beams are observed. In current research problems, the measurement models and methods discussed are far different from the quantum scale, and all results can be described in modern probability statistics. However, the complex operation associated with the shift operations on the phase space of permutations, modern statistical probability methods, and tools have difficulties to handle symmetric groups directly with arbitrary random permutation requirements.

The advanced Quantum Key Distribution (QKD), from a stochastic analysis viewpoint, needs to have effective measurement model and quantitative method to identify the source of a random sequence. Is it generated from a quantum random resource as a truly random sequence or a stream cipher as a pseudo-random sequence? It is impossible to make a classification use the NIST random testing package. This type of targets is also impossible to apply spectrum analysis and linear equation tools. More advanced models and methods are required.

For a 0-1 vector with multiple bits, analysis tools use classical probabilistic statistical models and methods. Since the specific problem of randomness testing is far beyond the combinatorial analysis and state automata, it is difficult to handle the demand of actual measurement and quantitative analysis due to ultra-complexity of the substitution and permutation on complicated modes. Similar to modern physics applying classical statistics, it is necessary to establish a solid logic foundation to support permutation and substitution operations in logic mechanism to make extension of analytical frontier to support both theoretical foundation and practical applications.

From mathematical logic, automatic control, quantum mechanics, artificial intelligence, etc., using probability and statistics, the demand for random sequence analysis and measurement uses the n variable 0-1 vectors and their linear combination cannot meet measurement requirements on various applications. Modern measuring methodology and technology need to use permutation and substitution operations on different levels of logic foundation to satisfy the frontier measurements on quantum physics, cryptographies, and artificial intelligence. From a measuring viewpoint, the emergence of a new measuring system is urgently required to deal with advanced applications.

#### Overview of Modern Group Theory

From a discrete representative viewpoint, every abstract group is isomorphic to a subgroup of the symmetric group of some set (Cayley's theorem) and permutations are the core basis in modern group theory.

The beginning of modern group theory can be traced back to Galois' contribution in the 1830s; Klein studied transformation group in the 1870s to propose Erlangen program to show the group theory as an invariant structure for symmetrical patterns and transformations. Inspired by Klein, Lie used infinitesimal symmetry transformations to establish a Lie algebra system.

Using the multiple tuples of variable structures, Hamilton proposed complex and quaternion expressions. Influenced by Gordon on invariant formula, Hilbert using finite basis constructed a complete system of an algebraic structure on n variables. In 1906, an infinite-dimensional Hilbert space of complex variables was developed. Based on the series of automorphic functions, Poincáre was the first person to discover a chaotic deterministic system which laid the foundations of modern complex dynamic system, fractal and chaos theory.

Through Noether's investigations on Einstein general relativity to determine the conserved quantities for every physical laws that possess some continuous symmetry as Noether theorem. A series of studies on invariants and symmetries were promoted the development of abstract algebra in the 1930s by refining algebraic structures as groups, rings, algebras, fields, and lattices.

In the 1930s, Weyl established the group theory of quantum mechanics; the theoretical basis of quantum mechanics was established based on the symmetry operator. Since the 1940s, Hua developed a complex matrix representation under symplectic group using the unit circle as the core. In the 1950s, Yang proposed the gauge invariance that plays a foundation role in modern field theory. Chern established the fiber bundle structure for the differential geometry of the complex function.

Preface xiii

From 1980s, the gauge field theory became the basic mathematical tool of modern physics. The eightfold/tenfold way of quark model plays a key role in the standard model of particle physics and the exploration of grand unified theory; the corresponding group structures are SU(3)/SU(5).

#### Brief History on 0-1 Logic Systems

From the perspective development of mathematical logic, the origin of the modern 0-1 logic system can be traced back to Leibniz's invention on binary counting and combinatorial analysis in the 1670s. In the 1850s, Boole proposed Boolean algebra; in the 1900s, Logic school made logic as the foundation of modern mathematics.

In the 1930s, Gödel proposed incompleteness theorem to be unprovable in a given formal system for Hilbert's decision problem. In 1936, Turing used infinite length of 0-1 sequence with read/write operation to be the Turing machine. Under Church's Lambda calculus, the Church–Turing thesis lays the theoretical foundation of computable and recursive theory.

Using 0-1 variables and logic operators, Shannon in 1937 proposed switch theory to provide module design, simulation, and implementation bases for modern computers and communication systems of technical supports. After more than half a century revolutionary development of semiconductor chips, electronic circuits from discrete separated components to integrated circuits, and then very large-scale integrated circuits, switch theory provides solid foundation on the basic theory, application analysis, and design tools.

Although the modern logic system was original developed from Leibnitz, use of permutation modes in state transformations can be traced back ancient time for several thousand years ago in oriental history. In the I-Ching system developed from the early days, Yin and Yang's representations are identified as the roots. Five thousand years ago, Fu-hsi proposed eight trigrams as an initial set that can be represented as eight states of three 0-1 variables. Using modern mathematics, one can see that the representations of the three layers of trigrams of Yin/Yang are equivalent to the eight diagrams and eight states of three 0-1 variables. Three thousand years ago, King Wen of Zhou dynasty proposed another order of eight trigrams to be different from Fu-hsi, that is, a permutation of the Fu-hsi group. In the 1050s, Shao Yung proposed a balanced binary tree as a natural order of a binary system same as the Leibniz binary counting.

Ancient Oriental philosophers have developed the logical foundation of Chinese traditional culture using this Yin/Yang symbol system. However, it must be pointed out that subsets of states are contained in this system with various logic paradoxes at different levels. This dialectical logic system based on the I-Ching is difficult to meet a list of important characteristics in formal logic: consistency, completeness, noncontradiction, soundness, etc.

#### Modern 0-1 Vector Algebra

For using 0-1 vectors and logic operators in vector operation mode, it is a natural way to extend parallel bit operations from a single bit to multiple bits. In addition, in order that bit operations can be effectively performed on multiple bits, it is necessary to implement permutation operations among bits. It is convenient to define a pair of bits with a fixed distance and cyclic shift operations on a given vector.

In the 1970s, Lee described cyclic shift operations in Modern Switch Circuit Theory and Digital Design. From the formula of vector switching functions, the canonical forms of vector switching functions are extremely complex and very powerful transformations.

Associated with the advanced development on block ciphers in cryptography, a new vector extension has been developed as Advanced Vector Extensions (AVS). Specific development of the new instruction for AES cipher algorithm is AES-NI package, which shows the latest achievements for block ciphers.

Under this type of vector permutation–substitution components, complex cryptographic algorithms can efficiently perform encryption and decryption requirements under permutation and substitution commands.

#### Introduction to Variant Construction

In the 1980s, the author studied the sorting problem on a vector of N integer elements using the symmetric group under 0-1 vector control, and constructed high-performance parallel sorting algorithms. Then, smoothly enlarging algorithms for Chinese fonts were proposed using logic operations on 2D bitmaps. In the 1990s, multiple levels of invariants were used to organize a state set as a phase space, and the conjugate classification and transformation of binary images was established.

In 2010, a new vector logic system was proposed using two composite operations: permutation and complement, to form a new vector logic system: Variant Logic. After 8 years of in-depth exploration, the variant construction is composed of three core components: variant logic, variant measurement, and variant map.

Using four meta states, multiple probability and statistical measurements can be constructed. By associating these measurements with quantitative expressions and combinatorial projections, more than 60 research papers and book chapters were published. Relevant contents are covered from theoretical foundation to sample applications. Since all these papers are published in various places all over the world, it is difficult for readers to systematically collect them for further reading. This book is the first one to collect the most relevant papers from theoretical foundation to sample applications to organize the variant construction as variant logic, variant measurement, variant map, meta model, and sample application systematically.

## The Organization of This Book

This book is composed of nine subparts in two main parts: theoretical foundation and sample application. The theoretical foundation is composed of four subparts: Variant Logic, Variant Measurement, Variant Map, and Meta Model.

Variant Logic describes n variable 0-1 vectors with 2<sup>n</sup> states which form a variant configuration space with 2<sup>n</sup>!2<sup>2</sup><sup>n</sup> members.

Variant Measurement defines on n tuple 0-1 vectors, four meta measures, and ten expansion operators established.

Variant Map illustrates 2<sup>n</sup> states and 22<sup>n</sup> transforming states, and multiple statistical probability distributions are investigated using four meta measures and their combinations in higher dimensional distributions.

Meta Model describes a concept cell model of knowledge representation and a multiple probability model on voting.

The part of ample application is composed of five subparts: Global Visualization, Quantum Interaction, Random Sequence, DNA Sequence, and Multi-valued Pulse Sequence. In Global Visualization, a list of function maps is used on medical image analysis, cellular automata rule space on exhaustive arrangement. In Quantum Interaction, conditional and relative probability distributions simulate two paths of quantum interactive effects. Random Sequence provides variant random number generators, a unified measurement model to handle both pseudo and truly random sequences in modern cryptographic applications on variant maps. In DNA Sequence, whole gene sequences are mapped on variant maps. In Multiple-valued Pulse Sequence, bat echo/ECG sequences are mapped on variant maps.

#### Suitable Readers of This Book

This book includes a wide range of topics from theoretical foundation to sample applications. Different parts may be suitable for specific groups. Variant Logic, Meta Model, and Variant Measurement are useful for basic researchers on logic, probability, statistics, analysis, and measures on mathematical foundation, combinatorial mathematics, metamathematics, quantum logic, and combinatorial group theory on levels of researchers and graduate students; Variant Measurement and Variant Map are suitable for application researchers and engineers in big data, complicated system analysis, feature extraction, artificial intelligence, applied mathematics, software engineers, senior college students, and postgraduate students; Variant Map and sample applications are suitable for requirements of complex system analysis/design, data engineer, big data engineer, artificial intelligence engineer, application development engineer, postgraduate, and senior undergraduate students.

Kunming, Yunnan, China Jeffrey Zheng April 2018

# Acknowledgements

The author would like to thank colleagues: Chris Zheng, Jianzhong Liu, Tao Chen, Yuzhong Luo, Tong Li, Yixian Yang, Lizhen Li, Zhengfu Han, Dawu Gu, Weizhong Yang, Jing Luo, Wei Zhou, Shaowen Yao, Lian Lu, Yinfu Xie, Chu Zhang, Xiazhou Yang, Xiaoyun Pu, Weilian Wang, Lu Shan, Ying Lin, Yunchun Zhang, Dennis Heim, Olga Heim, and Colin Campbell for their criticism, encouragement, suggestions, discussions, corrections, and help of various kind on this book.

I am particularly grateful to my students for the past 10 years: Bingjing Cai, Wenjia Zhao, Qin Kang, Qinping Li, Zhiqiang Yu, Yao Zhou, Jie Wan, Huan Wang, Jie-ao Zhu, Qinxian Bu, Weiqiong Zhang, Zu Wan, An Wang, Yuqian Liu, Lei Du, Ruoyu Shen, Heyuan Chen, Yan Ji, Guoxiu Zhai, Pingan Zeng, Wenjia Liu, Ruoxue Wu, Lixin Wu, Zhonghao Yang, Lihua Leng, Zhihui Hou, Yuyuan Mao, Yamin Luo, Zhefei Li, Yifeng Zheng, and many other students in a series of research courses and projects to explore extensive topics from data streams of binary/DNA/multiple-valued sequences to wider applications under variant construction.

I specially thank Tosiyasu Kunii and Bob Beaumont for lifetime friendship in encouragement and information guided us to explore meta models, various applications on Binary/DNA/ECG sequences, and other complicated signals in variant construction.

I sincerely thank four main funding resources to support us to complete this book.


# Contents



Contents xxi


# Contributors

D. M. Heim Key Laboratory of Quantum Information of Yunnan, Yunnan University, Kunming, China

O. Heim Leibniz Institute for Zoo and Wildlife Research, Berlin, Germany; Animal Ecology, Institute of Biochemistry and Biology, University of Potsdam, Potsdam, Germany

Zhihui Hou Yunnan University, Kunming, China

Qingping Li School of Software, Yunnan University, Kunming, China

Zhefei Li Key Laboratory of Quantum Information of Yunnan, Yunnan University, Kunming, China

Wenjia Liu Yunnan University, Kunming, China

Jin Luo School of Life Sciences, Yunnan University, Kunming, China

Yamin Luo Key Laboratory of Quantum Information of Yunnan, Yunnan University, Kunming, China

Yuyuan Mao School of Software, Yunnan University, Kunming, China

Ruoyu Shen School of Software, Yunnan University, Kunming, China

Jie Wan Yunnan University, Kunming, China; The People's Bank of China, Kunming, China

Huan Wang Yunnan University, Kunming, China

Weizhong Yang Shanghai Key Laboratory of Intelligent Information Processing, School of Computer Science, Fudan University, Shanghai, China; Key Laboratory of Quantum Information of Yunnan, School of Software, Yunnan University, Kunming, China

Zhonghao Yang Yunnan University, Kunming, China

P. A. Zeng Yunnan University, Kunming, China

Weiqiong Zhang School of Software and Microelectronics, Peking University, Beijing, China

Chris Zheng Tahto, Sydney, Australia; Key Laboratory of Quantum Information of Yunnan, Yunnan University, Kunming, China

Jeffrey Zheng Key Laboratory of Software Engineering of Yunnan, Yunnan University, Kunming, China; Key Laboratory of Quantum Information of Yunnan, Yunnan University, Kunming, China; Key Laboratory of Yunnan Software Engineering, Yunnan University, Kunming, Yunnan, China

Wei Zhou School of Software, Yunnan University, Kunming, China

# Part I Theoretical Foundation—Variant Logic

I-Ching has three key properties: 1. Simple, 2. Variant, 3. Invariant. —Zheng Xuan

The Monad, of which we shall here speak, is nothing but a simple substance, which enters into compounds. By simple is meant without parts.

—Gottfried W. Leibniz

Quaternions came from Hamilton after his really good work had been done, and though beautifully ingenious, have been an unmixed evil to those who have touched them in any way.

—Lord Kelvin

From a historical viewpoint, the first paper of variant logic foundation (A framework to express variant and invariant functional spaces for binary logic) was published in Frontiers of Electrical and Electronic Engineering in China, Higher Education Press and Springer 5(2):163–167 (2010). An extensive book chapter (Chapter "A framework of variant-logic construction for cellular automata") was published in the OA book of Cellular Automata—Innovative Modelling for Science and Engineering:325–352 (2011) by InTech Press to describe a variant logic framework systematically.

The Part I is composed of two chapters (1–2).

Chapter "Variant Logic Construction Under Permutation and Complementary Operations on Binary Logic" is shown the core construction of variant logic under two vector operations (Permutation, Complement) on 0-1 logic.

Chapter "Hierarchical Organization of Variant Logic" describes complex hierarchical organization under variant logic construction to compare with other logic systems.

# **Variant Logic Construction Under Permutation and Complementary Operations on Binary Logic**

**Jeffrey Zheng**

**Abstract** This chapter presents a binary logic framework whose function elements are invariant under permutation and complementary operations. The entire framework is described using 4 levels of hierarchy: *n* variables, 2*<sup>n</sup>* states, 2<sup>2</sup>*<sup>n</sup>* functions, and 2*<sup>n</sup>*!2<sup>2</sup>*<sup>n</sup>* logic functionals. Under the proposed framework, it is possible to determine higher level function complexity by analysing lower levels of organisation characteristics. These characteristics can be determined quite accurately because the symmetry conditions of variable and state organisations have invariant logic functions and a corresponding logic functional organisation. More symmetrical arrangement at state level creates more symmetrical permutations within the function space. Lower level properties are highly influential on the higher level properties of function components within a logic functional space. The proposed framework provides a logic foundation to describe complex binary systems using lower level properties, making analysis of systems more efficient and less calculation intensive. Different global coding schemes are discussed and typical two-variable cases of logic functionals are illustrated.

**Keywords** Vector permutation · Complement · Variant logic · Functional space Binary logic framework

J. Zheng (B)

J. Zheng Key Laboratory of Software Engineering of Yunnan, Yunnan University, Kunming, China

This work was supported by the Key Project on Electric Information and Next Generation IT Technology of Yunnan (2018ZI002), NSF of China (61362014), Yunnan Advanced Overseas Scholar Project.

Key Laboratory of Quantum Information of Yunnan, Yunnan University, Kunming, China e-mail: conjugatelogic@yahoo.com

J. Zheng (ed.), *Variant Construction from Theoretical Foundation to Applications*, https://doi.org/10.1007/978-981-13-2282-2\_1

#### **1 Introduction**

Mathematical invariance [1, 2] is key in the understanding and development of new scientific theories and technologies [3]. Most scientific theories rely on invariant properties of group behaviour and transformations [4] to describe the rules of the world we live in. Theories such as relativity and quantum mechanics all rely on invariance properties for their constructs [5]. In the field of mathematical logic, construction of theoretical frameworks [6, 7] focus upon three hierarchical levels: variables, states and function spaces. Boolean algebra and switching theory [8, 9] exploit combinatorial invariant properties, and use these foundational properties for implementing new theories and applications.

For reasons of consistency and symmetry of structure, logical operations are restricted to two types of canonical forms namely, the product-of-sums and the sumof-products approach. Any complex logic function can be rewritten as these two canonical forms. The use of a truth table enables analysis and the transformation into the canonical representations [6].

Following the introduction of Conway's Game of Life [10], Stephan Wolfram from the 1980s [11, 12] started to apply Boolean algebra to describe the behaviour of Cellular Automata. His approach used a binary counting sequence to naming different rules of behaviour based upon the functions generating the next iteration in the game. Wolfram identified four classes of transformations within the rules of Cellular Automata (CA). Results of findings are published in his book [13]—"A New Kind of Science". The main method of analysis in this area of research chooses a CA operation, recursively applying the operation to different initial conditions to find emergent patterns from the process. This approach creates many interesting results that can be visually identified [14, 15].

In the analysis of dynamic systems, it is essential to identify transformation spaces with functional invariance [16, 17]. An example in physics is phase space [2]. The phase space plays an essential role to describe key properties of a given dynamic system. Phase characteristics are more difficult to construct under a logic framework. A mechanism for linking lower level characteristics with higher levels properties such as symmetry currently does not exist. Under combinatorial logic, different permutations add no additional information to access information in phase space [14].

#### *1.1 Western and Eastern Logic Traditions*

Beginning with Aristotle (384–322 B.C.), the foundations of Western logic have played a key role in the development of today's global society [18]. The modern theory of logic systems comprise of a series of outstanding individuals and their contributions to the theory of logic: G. Leibniz and the introduction of the Binary Number System (1646–1716) [19, 20]; G. Boole and the development of Boolean Logic (1854) [21]; G. Cantor and Set Theory (1879); G. Frege and Conceptual Logic (1879) [22, 23]; B. Russell and Russell's Paradox (1910) [24]; J. Lukasiewicz and Multiple-Valued Logic (1920); D. Hilbert and Foundations of Geometric Logic (1923) [25], K. Gödel and his Incomplete Theorem (1931) [22], A. Turing and the Turing Machine (1936) [26]; C. Shannon and Switching Theory (1937) [27]; H. Reichenbach and Probability Logic (1949) [28]; as well as L. Zadeh and Fuzzy Logic (1965) [29]. Development of such theorems and mathematical frameworks have enabled Western culture to understand the operation of our world as a set of implementable rules. Logic and the development of rules for the expression of logic have provided a language that enabled the construction of today's scientific societies.

In contrast to the binary on–off nature of Western logic, Oriental culture have been influenced by spiritual traditions of balance and harmony. The theme of balance can be summarised in the I-Ching or 'The Book of Changes', one of the most influential books of classic Oriental literature [30–37]. The concept of Yin and Yang forces and the subtle interplay of the two opposing forces yield combinations and permutations of change. Orient philosophy believed that 'the only constant phenomena is change' and such a worldview emphasised the dynamic nature of a system; rather than focusing on the individual states of a system (on, off), prominence was instead placed on operations that yield change (on to off, off to on). The structure of thought introduced by the I-Ching allowed change to be systematically documented and analysed. Complex interactions, cyclic behaviour and the interplay of nature at all levels of oriental culture—sociology, literature, medicine, astrology and religion—were able to be described using the tools of dynamic logic provided by the I-Ching; the framework remains a complete philosophy as well as a universal language and has remained unchanged over the past two thousand years [38].

Leibniz in as early as 1690 realised that the balanced yin–yang structure proposed by Shao Yong (1050) was equivalent to the binary number system [33, 38]. However the Western scientific community have mostly disregarded the I-Ching; due mainly to cultural and language barriers as well as local superstitions that cloud the essence of the framework. In its ancient form of allegories and metaphors, the I-Ching is unable to satisfy the logician's requirement for completeness, consistence and other such properties. The challenge then is to be able present this philosophy for modern times, in the language of mathematics. Stripped of its colourful language, what insights does this ancient system contain? What are the essential differences between modern binary logic and the I-Ching's dynamic binary structures? The unification of these two schools of thought would bring greater understanding of the world we live in [35]. As the modern formulation of Cellular Automata generates complexity through binary logic whilst the I-Ching analyses complexity though binary logic, the modern language of the I-Ching can be found in the creation of a structural definition of CA.

#### *1.2 Logic and Dynamic Systems*

In the field of mathematical logic, construction of theoretical frameworks focus upon three spatial hierarchies: variables, states and function spaces [6, 7]. Boolean algebra and switching theory exploit such properties, using the combinatorial invariance of the framework for implementing new theories and applications [8, 9]. Logical operations are restricted to two types of canonical forms, namely the product-of-sums and the sum-of-products approaches. Any complex logic function can be rewritten as these two canonical forms. This is done for reasons of consistency, simplicity and symmetry of structure; as such the use of a truth table enables analysis and the transformation into the canonical representations [6].

In the analysis of dynamic systems, it is essential to identify transformation spaces with functional invariance [16, 17]. The Ising model is arguably the simplest binary system that undergoes a nontrivial phase transition [14]. In modern physics, this type of model uses a structure linked to phase space representation of a dynamic systems [2]. The phase space plays an essential role to describe key properties of any dynamic system, however under classical logic, phase characteristics are difficult to construct. A mechanism for linking low-level representations such as variables and states with higher level group properties such as symmetric conditions currently does not exist. This is more a limitation of the language and the operations allowed by the language. Classical logic is based on static combinatorial structures. Permutations, which are intrinsic to phase space, cannot be expressed under such a framework of classical combinatorial logic [14]. Cellular Automata frameworks [39], however, are fully dynamic and have been used to describe phase space [2]. Inspired by the traditional I-Ching hierarchical structures, new conditions, operations and relationships have been proposed on top of the Classical Logic framework to incorporate the dynamic nature of CA. The additional constructs provide support for CA using framework that is logically consistent and complete [40].

The [40] proposal builds upon earlier studies of logic systems from a structural viewpoint. Kunii and Takai [41] applied a n-cell structure for analysis, classification and generation of visual objects using topology and homotopy tools in computer graphics [42–46]. Zheng and Maeder [47] proposed a balanced classification on binary images for conjugate classification and transformation of binary images on regular plan lattices in 1990s to visualise different configurations [15, 48–50]. All such work used partial constructs of the [40] framework. The proposed framework supports classical logic, vector permutation and complementary operations. The new construction requires five spatial hierarchies containing 2<sup>2</sup>*<sup>n</sup>* × 2*<sup>n</sup>*! functional configurations for any *n* variables. This structure is much larger than classical logic having three spatial hierarchies supporting 2<sup>2</sup>*<sup>n</sup>* functions for *n* variables. Newly defined symmetric properties play an important role in predictions and classifications of possible recursive results. Using such properties, global behaviour can be identified and classified. A disadvantages of the new framework lies in its extreme complexity. It is possible to use parallel computers to do analysis of the configurations contained by *n* = 3 (the space already includes more than 10<sup>7</sup> configurations). It is impossible using today's technology to process the *n* = 5 space due to the extreme growth of structural complexity (2<sup>32</sup> × 32! configurations).

This chapter describes a logic framework, using invariant characteristics of permutations and complementary operations to identify an invariant structure under such mixed operations. This allows the definition of a phase space to be introduced into logic. The transformation does not change the relevant function space. A proposed 2D representation provides additional properties to predict different behaviours from permutations that influence higher level structures in a logic functional space.

#### **2 Truth Table Representation for a Logic Function Space**

The proposed framework describes three levels of a logic function space and the truth table representation of the space.

#### *2.1 Basic Definitions*

$$\begin{aligned} f: X \to Y; \quad Y = f(X); \quad X, Y \in \mathcal{B}\_2^N\\ X = X\_{N-1} X\_{N-2} \dots X\_j \dots X\_1 X\_0, \quad Y = Y\_{N-1} Y\_{N-2} \dots Y\_j \dots Y\_1 Y\_0\\ X\_j, Y\_j \in \mathcal{B}\_2, 0 \lessapprox j < N \end{aligned} \tag{1}$$

An example of a transform: the sequence *X* = 0001110100, *N* = 10 is an input for a function operation *f* , the output is a sequence of the same length *Y* = 1101011001; *X*, *Y* ∈ *B*<sup>10</sup> 2 .

**Definition 1** Let ... *X <sup>j</sup>* ... be a *n* bit structure:

$$\begin{aligned} \dots X\_j \dots &= \mathbf{x}\_{n-1} \mathbf{x}\_{n-2} \dots \mathbf{x}\_i \dots \mathbf{x}\_1 \mathbf{x}\_0 = \mathbf{x} \\ &\quad 0 \le i < n, \, 0 \le j < N, \mathbf{x} \in B\_2^n \end{aligned} \tag{2}$$

where *X <sup>j</sup>* = *xi* is a corresponding position.

$$Y\_j = f(\dots X\_j \dots) = f\left(\mathbf{x}\_{n-1}\mathbf{x}\_{n-2}\dots\mathbf{x}\_i \dots \mathbf{x}\_1\mathbf{x}\_0\right) = f(\mathbf{x})\tag{3}$$

In Boolean logic, *<sup>n</sup>* variables correspond to a full truth table with 2*<sup>n</sup>* <sup>×</sup> <sup>2</sup><sup>2</sup>*<sup>n</sup>* entries. The *I*th meta-state 0 ≤ *I* < 2*<sup>n</sup>* has *n*-bit number to occupy the *I*th column position, the *<sup>J</sup>* th function *<sup>T</sup>* (*<sup>J</sup>* ) has the *<sup>J</sup>* th row with 2*<sup>n</sup>* bits 0 <sup>≤</sup> *<sup>J</sup>* <sup>&</sup>lt; <sup>2</sup><sup>2</sup>*<sup>n</sup>* , the function value of the *I*th entry is determined by *T* (*J* )*<sup>I</sup>* . The full table can be represented as follows (Table 1):


**Table 1** Truth Tables of *n*-variables

**Method 1**: Process Method of Truth Table

**Input**: **x** : *n* variables in a {0, 1} sequence, *J* :selected function number

**Process**: Using the input sequence **x**, the meta-state number *I* is to select the *I*-th column of function *T* (*J* )

**Output**: Return *T* (*J* )*<sup>I</sup>* 's value (1 for true and 0 for false) as output.

#### *2.2 Permutation Invariants*

**Proposition 1** *Sequential Mapping Under sequential order, T* (*J* ) = *J.*

*Proof* The relevant output entries of *T* (*J* ) are mapped to the binary number *J* having 2*<sup>n</sup>* bits:

$$T(J) = T(S\_{2^n - 1}(J\_{2^n - 1})) \dots T(S\_I(J\_I)) \dots T(S\_0(J\_0))$$

$$= T(J)\_{2^n - 1} \dots T(J)\_I \dots T(J)\_0 = J \in B\_2^{2^n} \tag{4}$$

$$T(J)\_I = T(S\_I(J\_I)) = J\_I \in B\_2; 0 \le I < 2^n, 0 \le J < 2^{2^n}$$

**Definition 2** For any n binary logic variables, let Ω(*N*) be a symmetric group with *N* elements and *P* be a permutation operator, *P* ∈ Ω(2*<sup>n</sup>*), then for any *J*, ∃*K*, *J*, *K* ∈ *B*<sup>2</sup>*<sup>n</sup>* <sup>2</sup> , *<sup>P</sup>*(*<sup>T</sup>* (*<sup>J</sup>* )) <sup>=</sup> *<sup>K</sup>*, <sup>0</sup> <sup>≤</sup> *<sup>J</sup>*, *<sup>K</sup>* <sup>&</sup>lt; <sup>2</sup><sup>2</sup>*<sup>n</sup>* , the following permutation can be represented in Truth Table form:

$$\begin{aligned} P: J &\to K \\ P(T(J)) &= P(T(S\_{2^u - 1}(J\_{2^u - 1}))) \dots P(T(S\_I(J\_I))) \dots P(T(S\_0(J\_0))) \\ &= P(T(J)\_{2^u - 1}) \dots P(T(J)\_I) \dots P(T(J)\_0) \\ &= K\_{2^u - 1} \dots K\_I \dots K\_0 = K \in B\_2^{2^u} \\ P(T(J)\_I) &= P(T(S\_I(J\_I))) = T(S\_{P(I)}(J\_{P(I)})) \\ &= T(J)\_{P(I)} = J\_{P(I)} = K\_I \in B\_2 \\ &0 \le I < 2^u, 0 \le J, K < 2^{2^u}, P \in \mathcal{Q}(2^u) \end{aligned} \tag{5}$$

**Proposition 2** *The Truth Table under permutation operation on* 2*<sup>n</sup> meta-states can generate* <sup>2</sup>*<sup>n</sup>*! *sequences for* <sup>2</sup>2*<sup>n</sup> length of integers.*

*Proof* For any *P* ∈ Ω(2*<sup>n</sup>*), 2*<sup>n</sup>* are independent, it is composed of Ω(2*<sup>n</sup>*) elements.

For the one-variable condition (i.e. *n* = 1), there are only two possible arrangements. The initial sequence is represented as **S** = *S*<sup>1</sup> *S*<sup>0</sup> = 10, and a permutation operation generates the output *P*(**S**) = *S*<sup>0</sup> *S*<sup>1</sup> = 01. The following shows two groups of results:


For any permutation operation, the function *T* (*J* ) = *P*(*T* (*J* ))is always invariant. The inequality *J* = *K* = *P*(*J* ) holds in general.

#### **3 Fourth Level of Organisation**

Building upon the three levels (variables, states and functions), a fourth level of organisation is introduced.

#### *3.1 Complementary Operation*

**Definition 3** Complementary Operator, for any binary (0–1) variable *y* ∈ *B*2, let the relevant index δ ∈ *B*<sup>2</sup> be a complementary operator:

$$\mathbf{y}^{\delta} = \begin{cases} \bar{\mathbf{y}} & \delta = 0 \\ \mathbf{y} & \delta = 1 \end{cases} \tag{6}$$

**Definition 4** Complementary Function Operation, for any *n* variable function of 2*<sup>n</sup>* meta function vectors **S** = *S*2*n*−<sup>1</sup> ... *SI* ... *S*<sup>0</sup> Let Δ = δ2*n*−<sup>1</sup> ...δ*<sup>I</sup>* ...δ0, 0 ≤ *I* < <sup>2</sup>*<sup>n</sup>*, δ*<sup>I</sup>* <sup>∈</sup> *<sup>B</sup>*2, Δ <sup>∈</sup> *<sup>B</sup>*<sup>2</sup>*<sup>n</sup>* 2 .

For this type of complementary operations on function, Δ is

$$\Delta: T(J) \to K; J, K \in B\_2^{2^n}, 0 \le J, K < 2^{2^n}$$

$$\mathbf{S}^{\Delta} = S\_{2^n - 1}^{\delta\_{2^n - 1}} \dots S\_I^{\delta\_I} \dots S\_0^{\delta\_0}, S\_I \in B\_2^n$$

$$T(J)^{\Delta} = T(S\_{2^n - 1}^{\delta\_{2^n - 1}}(J\_{2^n - 1})) \dots T(S\_I^{\delta\_I}(J\_I)) \dots T(S\_0^{\delta\_0}(J\_0))$$

$$= T(J)\_{2^n - 1}^{\delta\_{2^n - 1}} \dots T(J)\_I^{\delta\_I} \dots T(J)\_0^{\delta\_0} \tag{7}$$

$$= K\_{2^n - 1} \dots K\_I \dots K\_0 = K \in B\_2^{2^n}$$

$$T(J)\_I^{\delta\_I} = T(S\_I^{\delta\_I}(J\_I)) = J\_I^{\delta\_I} = K\_I \in B\_2$$

$$0 \le I < 2^n, 0 \le J, K < 2^{2^n}, \delta\_I \in \Delta$$

## *3.2 Invariant Logic Functions Under Permutation and Complementary*

**Definition 5** Permutation and Complementary Operations. For any of the *n* variables expressed as 2*<sup>n</sup>* meta vectors, Complementary Operations <sup>Δ</sup> <sup>∈</sup> *<sup>B</sup>*2*<sup>n</sup>* <sup>2</sup> and Permutation Operations *P* ∈ Ω(2*<sup>n</sup>*) are expressed as

$$\begin{aligned} (P, \Delta) : T(J) &\to K; J, K \in \mathcal{B}\_2^{2^n}, P \in \mathcal{Q}(\mathcal{Z}^n), \Delta \in \mathcal{B}\_2^{2^n} \\ P(T(J)^\Delta) &= P(T(S\_{2^n - 1}^{\delta\_{2^n - 1}}(J\_{2^n - 1}^{\delta\_{2^n}}))) \dots P(T(S\_I^{\delta\_I}(J\_I))) \dots P(T(S\_0^{\delta\_0}(J\_0))) \\ &= P(T(J)\_{2^n - 1}^{\delta\_{2^n - 1}}) \dots P(T(J)\_I^{\delta\_I}) \dots P(T(J)\_0^{\delta\_0}) \\ &= K\_{2^n - 1} \dots K\_I \dots K\_0 = K \in \mathcal{B}\_2^{2^n} \\ P(T(J)\_I^{\delta\_I}) &= P(T(S\_I^{\delta\_I}(J\_I))) = J\_{P(I)}^{\delta\_{P(I)}} = K\_I \in \mathcal{B}\_2 \\ &0 \le I < 2^n, 0 \le J, K < 2^{2^n}, P \in \mathcal{Q}(2^n), \delta\_I \in \Delta \end{aligned} (8)$$

#### *3.3 Logic Functional Spaces*

**Theorem 1** (Logic Function Invariants under Permutation & Complementary Operations) *For any logic function, the output of Method 2 provides an equivalent output as the original Truth Table under all conditions.*

*Proof* A *J* th row on the permutation and complementary table of *P*(*T* Δ) for any *<sup>I</sup>* <sup>∈</sup> *<sup>B</sup><sup>n</sup>* <sup>2</sup> , *<sup>J</sup>* <sup>∈</sup> *<sup>B</sup>*<sup>2</sup>*<sup>n</sup>* <sup>2</sup> is constructed by

$$P(T(J)\_I^A) = T(J)\_{P(I)}^{\\\\\delta\_{P(I)}} = \begin{cases} \neg T(J)\_I & \delta\_{P(I)} = 0\\ T(J)\_I & \delta\_{P(I)} = 1 \end{cases} \tag{9}$$


After using Method 2, the results are shown:

$$P(T(J)\_I^\Delta) = \begin{cases} \neg \neg T(J)\_I = T(J)\_I & \delta\_{P(I)} = 0 \\ T(J)\_I & \delta\_{P(I)} = 1 \end{cases} \tag{10}$$

**Theorem 2** (Permutation Group for Meta Function Vector) *For* 2*<sup>n</sup> meta function vectors, a total of permutation numbers is* 2*<sup>n</sup>*!*.*

**Theorem 3** (Permutation & Complementary Structure) *Under permutation and complementary operations, a total of* <sup>2</sup>*<sup>n</sup>*!2<sup>2</sup>*<sup>n</sup> permutations can be generated to form a logic functional space for the n variables.*

## **4 Different Coding Schemes: One- and Two-Dimensional Representations**

The initial step to construct a series of logic functionals. Permutation and complementary differences can be shown in the proposed invariant function structures. Different coding schemes under different symmetric restrictions are established. Four schemes are described, in which one of them is in one-dimensional representation and other three schemes are two-dimensional representations. For binary sequences in sequential counting order, the scheme is known as the SL (Shao Yong & Leibniz) coding scheme.

#### *4.1 G Coding*

The General Code (G) is used to map permutation & complementary operations. For any state in the G coding scheme having 2*<sup>n</sup>* bits,

$$G: (J, \Delta, P) \to K; J, K \in B\_2^{2''}; \Delta \in B\_2^{2''}, P \in \mathcal{Q}.\tag{11}$$

#### *4.2 W Coding*

From the G coding scheme, their bit numbers are separated into two equal parts in the same bits to form a 2D representation. This mapping mechanism can represent a function space as a W coding scheme.

$$\begin{aligned} W: (J, \Delta, P) \to K = \langle J^1 | J^0 \rangle \\ J, K \in \mathcal{B}\_2^{2''}; J^1, J^0 \in \mathcal{B}\_2^{2''^{-1}}; S^1, S^0 \in \mathbf{S}, \Delta \in \mathcal{B}\_2^{2''}, P \in \mathcal{Q} \end{aligned} \tag{12}$$

Under this representation, a given logic functional for the function space is illustrated as a fixed matrix.

$$\{W(J)\}\_{J=0}^{2^{n}} = \begin{array}{|c|c|c|} \hline \{0|0\} & \dots & \langle 0|J^{0}\rangle & \dots & \langle 0|2^{2^{n-1}}-1\rangle \\ \hline \hline \dots & \dots & & \dots & \\ \hline \langle J^{1}|0\rangle & \dots & \langle J^{1}|J^{0}\rangle & \dots & \langle J^{1}|2^{2^{n-1}}-1\rangle \\ \hline \dots & \dots & & \dots \\ \hline \langle 2^{2^{n-1}}-1|0\rangle\lfloor\dots|\langle 2^{2^{n-1}}-1|J^{0}\rangle\lfloor\dots|\langle 2^{2^{n-1}}-1|2^{2^{n-1}}-1\rangle \\ \hline \end{array} \tag{13}$$

<sup>0</sup> <sup>≤</sup> *<sup>J</sup>* <sup>0</sup>, *<sup>J</sup>* <sup>1</sup> <sup>&</sup>lt; <sup>2</sup><sup>2</sup>*n*−<sup>1</sup> ; <sup>0</sup> <sup>≤</sup> *<sup>J</sup>* <sup>&</sup>lt; <sup>2</sup><sup>2</sup>*<sup>n</sup>*

In the one-variable condition, there are eight cases in their logic functional spaces as follows:


For better visualisation and expression, the one-dimensional G coding scheme is converted into a two-dimensional W coding scheme.


#### *4.3 F Coding*

Using 2D representation, symmetric condition can be added to arrange meta-states into specific order. For each pair of states in W, if they satisfy following condition, then a refined code: F coding scheme is determined.

$$\begin{array}{ccc} J^1 & \text{the } I \text{th meta-state} & \Longleftrightarrow & J^0 \text{ the } I \text{th meta-state} \\ & \updownarrow & \text{F coding scheme} & \updownarrow \\ X \in S^1 & \implies & \bar{X} \in S^0 \end{array}$$

#### *4.4 C Coding*

In addition to a pair of states in complementary relationship, further structure is introduced onto F code. When the pair of states in F have the same values in their *i*th position, they form a C coding scheme.


The C coding scheme, have the strongest symmetric conditions available. Only a relatively small number among the three invariant groups can be identified within this scheme.

#### **5 Two-Variable Cases**

Four groups of the proposed schemes are selected as examples. Each group of a logic functional represents 16 logic functions as 4×4 images. 4 groups are arranged as 2×2 blocks to arrange as Truth/False, Δ-Variant/Δ-Invariant properties. The 2×2 blocks correspond to:

Truth Block Δ-Variant <sup>Δ</sup> <sup>−</sup> Invariant False Block . Each block contains 16 entries of function images as a 4×4 (2<sup>2</sup> × 22) configuration. Each image entry denotes a transformed number and its function number in the form: *<sup>J</sup>* <sup>1</sup>|*<sup>J</sup>* <sup>0</sup> *<sup>J</sup>* where *<sup>K</sup>* = *<sup>J</sup>* <sup>1</sup>|*<sup>J</sup>* <sup>0</sup> is a transformed number and *J* is the function number. In all four figures, (a) 2×2 base blocks to represent function images and (b) 2×2 vector blocks to represent relevant coding schemes respectively.

In Fig. 1, the counting order of meta-states has been arranged as W coding (SL code): *P* = (3210), *P*(Δ) = 1010. In this group, only Functions 6 and 9 can be observed in complementary symmetric condition in main diagonal direction.

In Fig. 2, variation the configurations among W coding: *P* = (2301), *P*(Δ) = 0101 creates similar effects seen in Fig. 1.

In Fig. 3, the F coding scheme is shown: under this configuration, *P* = (2310), *P*(Δ) = 0110. Six pairs (0:15, 1:7, 2:11, 4:13, 6:9, 8:14) of complementary functions can be identified. The group has four blocks containing the same pairs of configurations.

In Fig. 4, C coding has represented: *P* = (3102), *P*(Δ) = 1100. In addition to six pairs as same as F coding, four corners are 4 functions (0, 5, 10, 15) in all blocks. This makes most regular structures compared to all other coding schemes.


**Fig. 1** W coding (SL code): *P* = (3210), *P*(Δ) = 1010; **a** 2×2 base blocks **b** 2×2 vector blocks


**Fig. 2** W coding: *P* = (2301), *P*(Δ) = 0101; **a** 2×2 base blocks **b** 2×2 vector blocks


**Fig. 3** F coding: *P* = (2310), *P*(Δ) = 0110; **a** 2×2 base blocks **b** 2×2 vector blocks


**Fig. 4** C coding: *P* = (3102), *P*(Δ) = 1100; **a** 2×2 base blocks **b** 2×2 vector blocks

#### **6 Conclusion**

It is shown in this chapter that the arrangement of binary function space using four levels of classification can be used to add symmetry and regular structure onto the entire space of binary functions. For ease of visualisation, it is convenient to apply 2D representation mechanism that enables symmetric configurations of the system to be analysed via different coding schemes. Binary functional spaces provide additional optimal information to generate large numbers of potential configurations in order to arrange and organise logic phase spaces.

The mechanism can be developed further to establish a solid logic foundation on logic functional levels for theoretical explorations and practical applications. We aim to make refined investigation on different coding schemes within the highest levels of organisation in our future work.

**Acknowledgements** Thanks Mr. J. Wan for generation all sample images and configurations and Dr. D. Heim for editing the chapter. Financial support was given by School of Software, Yunnan University.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Hierarchical Organization of Variant Logic**

**Jeffrey Zheng**

**Abstract** In modern logic, various systems have been proposed extending classical Boolean logic & switching theory. Such logic frameworks include multiple-valued logic, probability logic, fuzzy logic, module logic, quantum logic and various other frameworks. Although these extensions have been applied to many applications in mathematics, in science and in engineering, all extensions to Boolean logic invalidates at least one of the six fundamental rules of Boolean logic shown in L1 to L6. We propose a new framework of logic, variant logic, extending Boolean logic whilst satisfying the six fundamental rules (L1–L6). By defining the Variant–Invariant behaviour of logical operations, this framework can be constructed using four types of general operators. Main results of the chapter are summarized in **Theorems 8–10**, respectively. To show significant differences between classical logic and new variant logic, invariant properties of this hierarchical organization are discussed. Simplest cases of one-variable conditions are illustrated. Variant logic can provide the necessary framework to support analysis and description of Cellular Automata, Fractal Theory, Chaos Theory and other systems dealing with complexity. Such applications of this framework will be explored in future papers.

**Keywords** Switching theory · Boolean/multiple valued/probability/fuzzy logic Variant/invariant property · Hierarchical organization · Variant logic

J. Zheng (B)

J. Zheng Key Laboratory of Quantum Information of Yunnan, Yunnan University, Kunming, China

This work was supported by the Key Project on Electric Information and Next Generation IT Technology of Yunnan (2018ZI002), NSF of China (61362014), Yunnan Advanced Overseas Scholar Project.

Key Laboratory of Software Engineering of Yunnan, Yunnan University, Kunming, China e-mail: conjugatelogic@yahoo.com

J. Zheng (ed.), *Variant Construction from Theoretical Foundation to Applications*, https://doi.org/10.1007/978-981-13-2282-2\_2

## **1 Laws of Logic Systems**

#### *1.1 Laws in Classical Logic Systems*

Classical logic identifies a class of formal logic that are characterized by a number of properties [1–17].

**Definition 1** For any logic system if all CL1–CL5 are satisfied, then it is a classical logic system. The five properties of classical logic (CL1–CL5) are listed as follows:

CL1: Law of the excluded middle and double negative elimination

CL2: Law of non-contradiction

CL3: Monotonicity and idempotency of entailment

CL4: Commutativity of conjunction

CL5: De Morgan duality

Examples of such classical logic systems include works of philosophy and religion (Aristotle's Organon; Nagarjuna's tetralemma; and Avicenna's temporal modal logic) as well as foundational logic systems such as reformulations by George Bool and Gottlob Frege [4–17]. These properties can be rewritten as simplified equations describing basic properties of a logic system using characteristics of the five classical properties. The following equations (L1–L6) describe such a system.

L1: *P* ∪ *P* = *P* Idempotency L2: *P* ∩ *P* = *P* … L3: ¬*P* ∪ *P* = *P* Excluded Middle L4: ¬*P* ∩ *P* = *P* … L5: ¬¬*P* = *P* Double Negative Elimination L6: *P*, *P* → *Q*

The set of equations can be applied in the analysis of modern logic systems to determine if they are all satisfied. The equations will be defined as canonical properties and a logic system satisfying all six properties will be defined as a canonical system. If any logic system does not, they are categorized as non-canonical.

#### *1.2 Current Logic Systems*

Many modern logic systems cannot satisfy the six canonical properties. Three-valued logic proposed by Luckasiewicz 1920 can satisfy L3–L6, cannot satisfy L1–L4. Probability logic proposed by Reichenbach 1949 can satisfy L5–L6, cannot satisfy L1–L4. Fuzzy logic proposed by Zadeh 1965 satisfy L1, L2, L5, L6, cannot satisfy L3–L4. Since they cannot satisfy canonical properties, they are all non-canonical logic systems [1–22].

#### **2 Truth Valued Representation in Boolean Logic Systems**

For any *n*-variable Boolean logic system, it is natural to establish 2*<sup>n</sup>* states. Under either selected or not selected operation, it can be building up a truth table for a given Boolean function. Collecting all possible selections, a full truth table is constructed in 2*<sup>n</sup>* columns and 22*<sup>n</sup>* rows in presentation. We can list this table as follows:


where there are three parameters: *i*, *I*, *J* : 0 *i* < *n*, 0 - *I* < 2*<sup>n</sup>*, 0 - *J* < 2<sup>2</sup>*<sup>n</sup>* corresponding to variable, state and function numbers, respectively. Under such conditions, for any *J* , it is convenient to use Karnaugh map or relevant logic tools to construct the given Boolean function in combination [6–17].

#### **3 Cellular Automata Representations**

Cellular Automata—CA uses a different mechanism [23–35] to represent a given function. In a one-dimensional form of CA, a *N*-length binary sequence is

$$X = X\_{N-1}X\_{N-2}\dots X\_j \dots X\_1X\_0, 0 \lessapprox j < N, X\_j \in \{0, 1\} = B\_{21}$$

For a given function *f* , the output sequence is defined as follows: *f* : *X* → *Y*, *Y* = *f* (*X*),

$$Y = Y\_{N-1}Y\_{N-2}\dots Y\_j \dots Y\_1Y\_0, 0 \lessapprox j < N, Y\_j \in B\_2$$

It is feasible to use a moving window with a fixed length *n* to separate *X* into a local kernel in length *n*. The kernel can be presented as

$$[\ldots X\_j \ldots \ldots] = \alpha\_{n-1} \ldots \alpha\_i \ldots \alpha\_0, \alpha\_i \in B\_2 \ldots$$

For a given function *f*

$$\mathbf{y} = f(\mathbf{x}\_{n-1} \dots \mathbf{x}\_i \dots \mathbf{x}\_0),$$

It is necessary to assign a certain position *i* in the kernel for special care to associated with *j* position of both sequences. We have

$$\mathbf{y} = f(\mathbf{x}\_{n-1} \dots \mathbf{x}\_i \dots \mathbf{x}\_0) = f(\dots X\_j \dots) == Y\_j$$

or *<sup>X</sup> <sup>j</sup>* <sup>=</sup> *<sup>X</sup><sup>t</sup>*−<sup>1</sup> *<sup>j</sup>* , *Yj* <sup>=</sup> *<sup>X</sup><sup>t</sup> <sup>j</sup>* i.e.

$$f: X\_j^{t-1} \to X\_j^t, X\_j^{t-1}, X\_j^t \in B\_2$$

## **4 Variant Construction**

#### *4.1 Four Variation Forms*

Considering *<sup>f</sup>* : *<sup>X</sup>t*−<sup>1</sup> *<sup>j</sup>* <sup>→</sup> *<sup>X</sup><sup>t</sup> <sup>j</sup>* for any function of Boolean logic system to analyse their variation properties [36–40], it is normal to have following proposition.

**Proposition 1** *For any f* : *<sup>X</sup>t*−<sup>1</sup> *<sup>j</sup>* <sup>→</sup> *<sup>X</sup><sup>t</sup> <sup>j</sup> transformation, four forms of transforming classes are identified: T A* : 0 → 0*,TB* : 0 → 1*, TC* : 1 → 0*,TD* : 1 → 1*.*

*Proof X <sup>j</sup>*, *Yj* are 0-1 variables, only four classes listed are possible.

**Definition 2** Four transforming forms are corresponding to following sets: TA: Invariant class for 0 value, TB: Variant class for 0 value, TC:Variant class for 1 value, TD: Invariant class for 1 value.

Under such definition, the following proposition can be established.

**Proposition 2** *Using four classes of transformation, four variant operations are defined.*


*Proof* Truth (False) values are determined by *Yj*(*Y*¯ *<sup>j</sup>*) and Variant(Invariant) values are determined by {TB, TC} for 1(0) and {TA, TD} for 0(1) respectively.

**Theorem 1** *In { Truth, Variant, Invariant, False} groups, only two pairs of groups: {Truth, False} and {Variant, Invariant} satisfy L1–L6 to form a canonic logic system.*

*Proof* Both groups are composed of 0-1 variables, in addition, Truth/False, Variant/Invariant are formed complement relationships. Other combinations contain common parts, it is not possible for them to satisfy logic canonic conditions L1–L6. **Definition 3** Sequential number of binary is defined as SL coding to remember Y. Shao and Leibniz contribution [41–49] on binary logic.

**Definition 4** The operator *BN* : *J* → *B* converts an integer to its binary representation. The operator *DC* : *B* → *J* converts a binary number to its decimal representation.

**Definition 5** The SL coding scheme is an ordering of binary table outputs *<sup>T</sup>* : *<sup>B</sup>*2*<sup>n</sup>* <sup>2</sup> → *J* . An element *JI* ∈ *SL* at position *I*, where 0 - *I* < 2*<sup>n</sup>* represents function *TI* such that the binary representation of *TI* is defined as

*BN*(*J* ) = *T*2*n*−1[*J*2*n*−1] ... *TI* [*JI* ] ... *T*0[*J*0]

For any *n* variable structure, *J* is composed of 2*<sup>n</sup>* bits to represent 0 - *J* < 2<sup>2</sup>*<sup>n</sup>* numbers.

**Definition 6** A G coding scheme is defined as an ordering of binary table outputs *T* : *B*2*<sup>n</sup>* <sup>2</sup> → *J* . An element *JI* ∈ *SL* at position *I* where 0 - *I* < 2*<sup>n</sup>* represents function *TI* such that the binary representation of *TI* is defined as

$$G = \{ \forall J | T(J), 0 \le J < 2^{2^n} \};$$

*T* (*J* ) = *T*2*n*−1[*Y* (*J*2*n*−<sup>1</sup>)] ... *TI* [*Y* (*JI*)] ... *T*0[*Y* (*J*0)], 0 -*I* < 2*<sup>n</sup>*

Where {*Y* (*JI*), 0 - *<sup>I</sup>* <sup>&</sup>lt; <sup>2</sup>*<sup>n</sup>*} are 2<sup>2</sup>*<sup>n</sup>* length 0-1 vectors, *Y* (*J*2*n*−<sup>1</sup>) = ... = *Y* (*JI*) = ... = *Y* (*J*0), respectively.

Under G coding scheme, ordering number is an integer sequence with 2<sup>2</sup>*<sup>n</sup>* positions. Different transformations will make this sequence extremely complex. In convenient to do representation, a two-dimensional W coding scheme is proposed.

**Definition 7** A W coding scheme is defined as an ordering pair of binary table outputs *<sup>T</sup>* : *<sup>B</sup>*<sup>2</sup>*<sup>n</sup>* <sup>2</sup> → *J* <sup>1</sup>|*J* <sup>0</sup> . Each component is composed of 2*<sup>n</sup>*−<sup>1</sup> bits in representation:

$$\langle J^1 | J^0 \rangle = T\_{2^n - 1} [Y(J\_{2^n - 1})] \dots T\_I [Y(J\_I)] \dots T\_0 [Y(J\_0)], 0 \le I < 2^n$$

$$J^0 = \{ \forall I | BN(J\_I \\ mod 2^{n - 1}), 0 \le I < 2^{n - 1} \}$$

$$J^1 = \{ \forall I | BN(J\_I \\ mod 2^{n - 1}), 2^{n - 1} \le I < 2^n \}$$

Under this construction, a G coding scheme is transformed into aW coding scheme to represent two-dimensional structure for different permutation results. In general, *J* <sup>0</sup> represents lower 2*<sup>n</sup>*−<sup>1</sup> bits and *J* <sup>1</sup> represents higher 2*<sup>n</sup>*−<sup>1</sup> bits, respectively. A general structure of W coding is a 2<sup>2</sup>*n*−<sup>1</sup> <sup>×</sup> <sup>2</sup><sup>2</sup>*n*−<sup>1</sup> matrix shown in the following figure.


<sup>0</sup> <sup>≤</sup> *<sup>J</sup>* <sup>0</sup> , *J* <sup>1</sup> < 22*n*−<sup>1</sup> { *<sup>J</sup>* <sup>1</sup> |*J* 0 }: 2D Space for 22*<sup>n</sup>* Functions

#### *4.2 Complement and Variant Operators*

**Definition 8** In *B<sup>n</sup>* <sup>2</sup> , the generalized complement *<sup>Y</sup> <sup>Q</sup>*, *<sup>Q</sup>* <sup>∈</sup> *<sup>B</sup>*2*<sup>n</sup>* <sup>2</sup> of a variable *Y* is defined to be the element obtained from complementing the components of *Y* according to the value of corresponding component of *Q*; *YI* is complemented or un-complemented if *QI* is 0 or 1, respectively, where *YI* and *QI* designate the Ith component of *Y* and *Q*.

For example, given *B*<sup>4</sup> <sup>2</sup> for *Q* = {0101, 0110} are as follows:


To apply *Q* operator on 2*<sup>n</sup>* meta vectors, a vector family can be generated.

**Proposition 3** *In B*2*<sup>n</sup>* <sup>2</sup> *, generalized complement operator Q* <sup>∈</sup> *<sup>B</sup>*2*<sup>n</sup>* <sup>2</sup> *has* <sup>2</sup><sup>2</sup>*<sup>n</sup> different cases.*

*Proof Q* is a 2*<sup>n</sup>* bits vector, each position can be selected as 0 or 1, so a total of selections is equal to 2<sup>2</sup>*<sup>n</sup>* .

**Definition 9** For 2*<sup>n</sup>* meta states composed of vector Ψ, the *i*th vector Ψ (*i*), 0 ≤ *i* < *n* has 2*<sup>n</sup>* bits. Four vectors: {**0**,Ψ(*i*), ¬Ψ (*i*), **1**} in 2*<sup>n</sup>* bits can be selected as *Q* operators. This special form of *Q* type operations is defined as *QV* operation.

**Proposition 4** *For a QV operator, QV* ∈ {**0**,Ψ(*i*), ¬Ψ (*i*), **1**}*, four QV vectors provide following complement results respectively in transformation:*

> **0** : *False Operator* **1** : *Truth Operator* Ψ (*i*) : *Invariant Operator* ¬Ψ (*i*) : *Variant Operator*

*Proof* **1** operator keeps original truth table values; **0** operator reverses all values;Ψ (*i*) operator makes invariant condition and ¬Ψ (*i*) operator generates variant property. **Proposition 5** *Undertaken QV operations,* 2*<sup>n</sup>*+<sup>1</sup> *cases are generated as a complement variant group.*

*Proof* Only 0 ≤ *i* < *n* selected, each position have two selections associated with *i* plus two constant vectors. So a total of 2 × 2*<sup>n</sup>* = 2*<sup>n</sup>*+<sup>1</sup> cases can be generated.

**Definition 10** For 2*<sup>n</sup>* meta vectors *<sup>Y</sup>* , its *<sup>I</sup>*th component *<sup>Y</sup>* (*I*) <sup>∈</sup> *<sup>B</sup>*22*<sup>n</sup>* <sup>2</sup> , *<sup>Y</sup>* (*I*) has 22*<sup>n</sup>* bits. A permutation operator *P* makes the *I*th component into *P*(*I*)th component for ∀*I*, 0 ≤ *I* < 2*n*, respectively.

**Proposition 6** *Undertaken P operation to* 2*<sup>n</sup> meta vectors in Y , a total of* 2*<sup>n</sup>*! *permutations can be generated.*

*Proof P* operator is equal to permutation on 2*<sup>n</sup>* integers. This generates a symmetric group contained 2*<sup>n</sup>*! members.

**Proposition 7** *Undertaken Q and P operators in Y , a total of* 2<sup>2</sup>*<sup>n</sup>* · 2*<sup>n</sup>*! *cases can be created. This creates a Complement Permutation Structure—CPS.*

*Proof Q* and *P* operators are independent of each other. Their results can be multiplied together.

**Proposition 8** *Undertaken QV and P operators in Y , a total of* 2*<sup>n</sup>*+<sup>1</sup> · 2*<sup>n</sup>*! *cases can be created. This creates a Complement Variant Structure—CVS.*

*Proof QV* and *P* operators are independent each other. Their results can be multiplied together.

#### *4.3 Other Global Coding Schemes*

Under *QV* + *P* and *Q* + *P* operations, more coding schemes can be defined.

**Definition 11** The F coding scheme is defined as a subset W. For any W code, if any two meta state can be paired, such that ∀ *j*1, *j*<sup>1</sup> − 2*<sup>n</sup>*−<sup>1</sup> = *j*0, 0 ≤ *j*<sup>0</sup> < 2*<sup>n</sup>*−<sup>1</sup> ≤ *j*<sup>1</sup> < 2*<sup>n</sup>*, *Ij*<sup>1</sup> = ¯*Ij*<sup>0</sup> indicate state *Ij*<sup>1</sup> be *Ij*<sup>0</sup> 's complement.

F coding provides restricted pair conditions to the structure. Its corresponding forms are as follows:

$$\begin{array}{ccc} J^1 \ j \text{-th meta state} & \Longleftrightarrow & J^0 \ j \text{-th mate state} \\ \updownarrow & & \text{F coding base} \\ X & \implies & \bar{X} \end{array}$$

**Definition 12** A coding scheme satisfies general conjugate condition if ∀*Ij*<sup>0</sup> ∈ *IJ* <sup>0</sup> , for the selected position *i*, ∀*ai* ∈ *Ij*<sup>0</sup> , *ai* = 0, 0 ≤ *i* < *n*.

In other words, the general conjugate condition makes selected position on lower part in 0 valued and higher part in 1-valued, respectively.

**Definition 13** The C coding scheme is defined as a set of the F coding whereby ∀*Ij*<sup>0</sup> ∈ *IJ* <sup>0</sup> , for the selected position *i*, ∀*ai* ∈ *Ij*<sup>0</sup> , *ai* = 0, 0 ≤ *i* < *n*.

C coding provides more strong restrictions to separate all 0-valued meta states in lower part and all 1-valued meta states in higher part.


Some coding samples are listed in following table:


#### *4.4 Sizes of Variant Spaces*

**Definition 14** Under *QV* + *P* operations, W, F and C coding schemes are defined as WV, FV and CV coding schemes, respectively.

**Theorem 2** *For a W coding scheme of n variables, it has a total of* 2<sup>2</sup>*<sup>n</sup>* · 2*<sup>n</sup>*! *cases distinguished.*

**Theorem 3** *For a WV coding scheme of n variables, it has a total of* 2*<sup>n</sup>*+<sup>1</sup> · 2*<sup>n</sup>*! *cases distinguished.*

**Theorem 4** *For a F coding scheme of n variables, it has a total of* 2<sup>2</sup>*<sup>n</sup>* · <sup>2</sup><sup>2</sup>*n*−<sup>1</sup> · 2*<sup>n</sup>*−<sup>1</sup>! = <sup>2</sup><sup>2</sup>*<sup>n</sup>* (1+1/2) · <sup>2</sup>*<sup>n</sup>*−<sup>1</sup>! *cases distinguished.*

**Theorem 5** *For a FV coding scheme of n variables, it has a total of* <sup>2</sup>*<sup>n</sup>*+<sup>1</sup> · <sup>2</sup><sup>2</sup>*n*−<sup>1</sup> · <sup>2</sup>*<sup>n</sup>*−<sup>1</sup>! = <sup>2</sup><sup>2</sup>*n*+*n*+<sup>1</sup> · <sup>2</sup>*<sup>n</sup>*−<sup>1</sup>! *cases distinguished.*

**Theorem 6** *For a C coding scheme of n variables, it has a total of* 22*<sup>n</sup>* · 2*<sup>n</sup>*−<sup>1</sup>! *cases distinguished.*

**Theorem 7** *For a CV coding scheme of n variables, it has a total of* 2*<sup>n</sup>*+<sup>1</sup> · 2*<sup>n</sup>*−<sup>1</sup>! *cases distinguished.*

Using definitions of different coding schemes, shown in various sequences of one variable cases in the following table:


using 2D W coding to arrange 1D sequences into 2D matrices:


#### **5 Invariant Properties of Variant Constructions**

It is interesting to notice that under *QV* operations, there are 2*n* + 2 vectors available to generate QVS. This makes significant differences among classical logic and Variant logic construction [50–56]. The main results of this chapter are summarized in the following theorems.

**Theorem 8** (Four Invariant Points for One Variable Condition) *For a W coding scheme under one variable condition, four points of the structure correspond to four functions:* {0, *x*, *x*¯, 1}*, respectively.*

*Proof* When *n* = 1, four vectors are available for any *Q* or *QV* operations.

**Theorem 9** (Two Invariant Points for Truth and False Schemes) *For any n* > 1*, W(WV) coding schemes, for any truth or false representation, only full 0 or full 1 valued vectors can be invariant undertaken P operations.*

*Proof* Undertaken *P* operation, if there is any not full 0 or 1 vectors, its binary number sequences will be changed.

**Theorem 10** (Four Invariant Points for C Coding Scheme) *For any C (CV) coding scheme in variant construction, four corner positions of 2D function matrix have extreme invariant properties.*

*Proof* Under C(CV) coding scheme, four functions:{0, *x*, *x*¯, 1} correspond as follows: *x* = 0|0 ; <sup>0</sup> <sup>=</sup>2<sup>2</sup>*n*−<sup>1</sup> − 1|0 ; <sup>1</sup> <sup>=</sup>0|2<sup>2</sup>*n*−<sup>1</sup> − 1 ; ¯*<sup>x</sup>* <sup>=</sup>2<sup>2</sup>*n*−<sup>1</sup> <sup>−</sup> <sup>1</sup>|2<sup>2</sup>*n*−<sup>1</sup> − 1 . Four positions are all corner points of the variant matrix.

#### **6 Comparison**

It is convenient to list numeric parameters to compare the different coding schemes in the following table.


where we use Var: variable number; State: state number; Function: function number; ExPower: exponent power products; SL: SL coding number; W coding: W coding number under *Q* + *P* operations; WV coding: WV coding number under *QV* + *P* operations; C coding: C coding number under *Q* + *P* operations; CV coding: CV coding number under *QV* + *P* operations in the table, respectively.

#### **7 Conclusion**

In this chapter, variant logic has been proposed to extend truth table representation that describes variant properties of binary sequences. This extension is requiredto expand traditional Boolean logic framework to a new variation space. Under two types of vector operations, the new space has 22*<sup>n</sup>* 2*<sup>n</sup>*! times more complexity than traditional Boolean function space with 22*<sup>n</sup>* members. In order to manage this complexity, the framework has proposed a series of global coding schemes encoded through symmetric properties representing the elements in a matrix as a 2D map. Under this two-dimensional model, coding mechanism can be constructed and their invariant properties can be discussed.

Boolean function space represents a core invariant functional space and the newly expanded space broadens the descriptions and coding schemes used. Thus, a wide area of variation coding can be developed. In essence, the space of binary sequence functions can be thought of as a keyboard with 2<sup>2</sup>*<sup>n</sup>* notes. Each note contains a complete Boolean function set and its own representation. The set of notes can be represented using a coding scheme that orders the notes in a particular sequence (SL and G codes) or their 2D maps (W, F and C codes).

Under W coding representation mechanism, 2D matrix is suitable to visualize permutation sequences of *n* variable logic structures. Using invariant properties, classical logic and variant logic can be clearly identified. Further work on dynamic behaviours of complex dynamic systems can be explored. This chapter outlines the construction and notation of variant logic only. Future papers will show that the proposed scheme, with its foundation in symmetry, will have definite uses for predicting convergent and chaotic behaviour in dynamic binary systems such as the analysis of cellular automata rules using various visual methodologies.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Part II Theoretical Foundation—Variant Measurement

All of mathematics is a tale about groups. —Henri Poincaré

In geometric and physical applications, it always turns out that a quantity is characterized not only by its tensor order, but also by symmetry.

> —Hermann Weyl Nothing exists until it is measured. —Niels Bohr

A list of research papers were published on variant measurements during 2011– 2012. Two OA book chapters that are important to express core results of variant measurements (Chapter "From Local Interactive Measurements to Global Matrix Representations on Variant Construction, From Conditional Probability Measurements to Global Matrix Representations on Variant Construction") are published in Advanced Topics in Measurements:339–400 (2012) by InTech Press.

Part II is composed of three chapters (3–5).

Chapter "Elementary Equations of Variant Measurement" provides the elementary equation of variant measurement to discuss four meta measures under permutative and associative properties. Two sets of sample partitions are expressed as sum of product of binomial coefficients in the elementary equation. This is a systematic approach to handle configuration space under four meta measures.

Chapter "Triangular Numbers and Their Inherent Properties" uses triangular numbers to express inherent properties of 1D binary sequences under three parameters as an elementary equation. A set of interesting properties were explored. This scheme provides efficient partitions to handle rotational invariant properties on binary sequences.

Chapter "Symmetric Clusters in Hierarchy with Cryptographic Properties" describes symmetric clusters in hierarchy under multiple symmetric operations: combination, crossing, variant, and rotation conditions. Rich clusters were observed under various conditions.

# **Elementary Equations of Variant Measurement**

**Jeffrey Zheng**

**Abstract** Four variant measures are used to represent combinatorial functions including binomial coefficients. These variant measures are based on two types of *m*-bit vectors. Type A corresponds to non-periodic boundary conditions, while Type B corresponds to periodic boundary conditions. For each type, groups containing the four variant measures are formed, which are invariant against permutative and associative operations. By mapping two group elements of Type B on coefficients of binomial decompositions, patterns similar to Pascal's triangle are observed.

**Keywords** Variant measurement · *<sup>m</sup>* variable vector · Multinomial coefficient Permutative and associative operations · Global invariant

# **1 Introduction**

For any *<sup>n</sup>* 0–1 variables, variant logic provides a 2*<sup>n</sup>*! × <sup>2</sup><sup>2</sup>*<sup>n</sup>* -dimensional configuration space [16, 17] to support measurement and analysis [14, 15], which is a real difficulty for any practical activities [1, 9–11]. From a measuring analysis viewpoint [6–8, 13], it is essential to manipulate static states and their measuring clustering as effective measures to be a core content of any 0–1 measuring framework. In this chapter, starting from *m* variables of a 0–1 vector, binomial expressions are applied to support the four meta measures of variant partitions and associated multinomial expressions.

J. Zheng (B)

This work was supported by the Key Project on Electric Information and Next Generation IT Technology of Yunnan (2018ZI002), NSF of China (61362014), Yunnan Advanced Overseas Scholar Project.

Key Laboratory of Quantum Information of Yunnan, Yunnan University, Kunming, China e-mail: conjugatelogic@yahoo.com

J. Zheng Key Laboratory of Software Engineering of Yunnan, Yunnan University, Kunming, China

J. Zheng (ed.), *Variant Construction from Theoretical Foundation to Applications*, https://doi.org/10.1007/978-981-13-2282-2\_3

Using permutative and associative operations, various variation and invariant properties are investigated. From a global invariant viewpoint, various combinatorial clustering properties are systematically explored.

#### **2 Elementary Equation**

Let *<sup>x</sup>* be an *<sup>m</sup>*-bit vector, *<sup>x</sup>* <sup>=</sup> *<sup>x</sup>*0*x*<sup>1</sup> ... *xi* ··· *xm*−<sup>1</sup>, *xi* ∈ {0, <sup>1</sup>}, <sup>0</sup> <sup>≤</sup> *<sup>i</sup>* <sup>&</sup>lt; *<sup>m</sup>*, *<sup>x</sup>* <sup>∈</sup> *<sup>B</sup><sup>m</sup>* 2 . Each *x* is an *m* bit state. From a variation viewpoint, there are two types {*A*, *B*} distinguished. Let {*m*⊥, *m*+, *m*−, *m*} be four measuring operators.

#### *2.1 Type A Measures*

For a pair of (*i*,*i* + 1) elements, (*xi*, *xi*+<sup>1</sup>), 0 ≤ *i* < (*m* − 1) form partitions. (Nonperiodic boundary conditions)

Four measures can be calculated from the following equations.

$$m\_{\perp}(\mathbf{x}) = \sum\_{i=0}^{m-2} [(\mathbf{x}\_i, \mathbf{x}\_{i+1}) == (0, 0)] \tag{1}$$

$$m\_+(\mathbf{x}) = \sum\_{i=0}^{m-2} [(\mathbf{x}\_i, \mathbf{x}\_{i+1}) == (0, 1)] \tag{2}$$

$$m\_{-}(\mathbf{x}) = \sum\_{i=0}^{m-2} [(\mathbf{x}\_i, \mathbf{x}\_{i+1}) == (1, 0)] \tag{3}$$

$$m\_{\top}(\mathbf{x}) = \sum\_{i=0}^{m-2} [(\mathbf{x}\_i, \mathbf{x}\_{i+1}) == (1, 1)] \tag{4}$$

$$m = m\_\perp(\mathbf{x}) + m\_+(\mathbf{x}) + m\_-(\mathbf{x}) + m\_\mp(\mathbf{x}) + 1\tag{5}$$

From a clustering viewpoint, the last bit of *x*, *xm*−<sup>1</sup> can be used to distinguish relevant combinatorial numbers. While *xm*−<sup>1</sup> == 1, there are *<sup>m</sup>*−<sup>1</sup> *m*++*m*+1 and for *xm*−<sup>1</sup> == 0, there are *<sup>m</sup>*−<sup>1</sup> *m*++*m* , possible *x* vectors, where *m*<sup>+</sup> + *m* is the number of 1 elements in a vector. By adding both binomial coefficients, Pascal's rule [4] is obtained.

$$\begin{aligned} \binom{m}{p} &= \binom{m-1}{p} + \binom{m-1}{p-1}, \\ p(\mathbf{x}) &= m\_+(\mathbf{x}) + m\_\top(\mathbf{x}) + 1, 0 \le p \le m, \mathbf{x} \in B\_2^m \end{aligned} \tag{6}$$

#### *2.2 Type B Measures*

A pair of(*i*,*i* + 1) elements is linked as a ring,(*xi*, *xi*+1(*mod m*)), 0 ≤ *i* < *m* (Periodic boundary conditions).

$$m\_{\perp}(\mathbf{x}) = \sum\_{i=0}^{m-1} [(\mathbf{x}\_i, \mathbf{x}\_{i+1 \text{(mod } m)}) == (0, 0)] \tag{7}$$

$$m\_+(\mathbf{x}) = \sum\_{i=0}^{m-1} [(\mathbf{x}\_i, \mathbf{x}\_{i+1 \text{(mod } m)}) == (0, 1)] \tag{8}$$

$$m\_{-}(\mathbf{x}) = \sum\_{i=0}^{m-1} [(\mathbf{x}\_i, \mathbf{x}\_{i+1 \text{(mod } m)}) == (1, 0)] \tag{9}$$

$$m\_{\top}(\mathbf{x}) = \sum\_{i=0}^{m-1} \mathbf{l}(\mathbf{x}\_i, \mathbf{x}\_{i+1 \text{(mod } m)}) == (1, 1) \mathbf{l} \tag{10}$$

$$m = m\_{\perp}(\mathbf{x}) + m\_{+}(\mathbf{x}) + m\_{-}(\mathbf{x}) + m\_{\mp}(\mathbf{x})\tag{11}$$

Let *p* be the number of 1 elements, *p*(*x*) = *m*+(*x*) + *m*(*x*), then the number of possible *x* vectors is

$$
\binom{m}{p}, \ 0 \le p \le m. \tag{12}
$$

#### **3 Partition**

Either Type A or B, internal parameters are associated with the four meta measures. For a brief analysis, Type B will be selected as initial part, multinomial coefficients are applied to partition relevant binomial coefficients. Using *m* variable, *p* number and *q* branches, the following equations are formulated. Under the partition condition, vector *x* can be ignored.

$$m = m\_{\perp} + m\_{+} + m\_{-} + m\_{\top} \tag{13}$$

$$p = m\_+ + m\_\top \tag{14}$$

$$m - p - q = m\_{\perp} \tag{15}$$

$$q = m\_+ = m\_- \tag{16}$$

$$p - q = m\_{\top} \tag{17}$$

Based on equivalent quantitative numbers, there are one-to-one corresponding on the four meta measures and relevant quantitative measures:

$$\{m\_\perp, m\_+, m\_-, m\_\mp\} \leftrightarrow \{m - p - q, q, q, p - q\}$$

from a global restriction to establish an equivalent expressional framework.

From an expressional viewpoint, different partitions are investigated from a single binomial coefficient to a set of multinomial coefficients with equivalent properties among different expressions. Their partitions undertaken on various levels are illustrated in the following sections. From a binomial coefficient, there are multiple levels of representations involved, the first level and the nth level can be connected as

$$
\begin{pmatrix} m\_{\perp} + m\_{+} + m\_{-} + m\_{\top} \\ p \end{pmatrix} \to \sum\_{k=0}^{p} \prod\_{l=1}^{n} \binom{f\_{l}(m\_{\perp}, m\_{+}, m\_{-}, m\_{\top})}{g\_{l}(p, k)} \qquad (18)
$$

$$
0 \le p \le m \qquad 0 \le k \le m.
$$

The core content of this chapter is to establish a global invariant framework using *n* levels of representations by deriving the functions *fl* and *gl* .

#### **4 Variation Space**

Let {a,b,c,d} be a set of four distinct measures. Two operations, permutative and associative, can be determined. For an ordered tuple with four measures (*a*, *b*, *c*, *d*), Permutative operator π: (*a*, *b*, *c*, *d*) → (π(*a*), π(*b*), π(*c*), π(*d*)) to map one measure to another measure.

Associative operator α:{*a*, *b*, *c*, *d*} → α{*a*, *b*, *c*, *d*} to group one to multiple measures keeping the initial ordering.

e.g. (*a*, *b*, *c*, *d*) → (*b*, *d*, *a*, *c*) is a permutative operation and {*a*, *b*, *c*, *d*}→{*a*, *b*}{*c*}{*d*} is an associative operation.

A permutative operation changes the order of four tuple variables and an associative operation changes sequential relationship on its neighbourhood elements. In a normal arithmetical condition, two operations have conservative under add operations with global invariant properties. From an algebraic viewpoint, two operations are independent.

**Lemma 1** *For an ordering structure with four measures under two operations: permutative and associative, there are 192 configurations identified.*

*Proof* For a vector with 4 members, there are a total of 24 distinct permutations 4! = 24. For an ordered set of 4 elements, 8 associated patterns are identified as follows: {{a,b,c,d}; {a}{b,c,d}; {a,b}{c,d}; {a,b,c}{d}; {a}{b}{c,d}; {a}{b,c}{d}; {a, b}{c}{d}; {a}{b}{c}{d}}. Two operations are independent, so the whole system contains 24 × 8 = 192 configurations.

#### **5 Invariant Combination**

Using both permutative and associative operations, various combinatorial invariants can be identified.

#### *5.1 Type A Invariants*

Five invariant groups can be distinguished.


**Proposition 1** *For a measuring structure with four members, Type A has 16 combinatorial invariants distinguished (0 item: 1 cluster; 1 item: 1 cluster; 2a item: 4 clusters; 2b item: 3 clusters; 3 item: 6 clusters; 4 item: 1 cluster).*

*Proof* Checking Type A conditions listed, all combinatorial conditions are exhaustive included.

#### *5.2 Type B Invariants*

For Type B, let *b* = *c*, following simplification can be performed.


**Proposition 2** *For a measuring structure with four members, Type B has 12 combinatorial invariants distinguished (0 item: 1 cluster; 1 item: 1 cluster; 2a item: 3 clusters; 2b item: 2 clusters; 3 item: 4 clusters; 4 item: 1 cluster).*

*Proof* Checking Type B conditions listed, all combinatorial conditions are exhaustive included.

#### **6 Combinatorial Expressions of Type B Invariants**

Applying *m*<sup>⊥</sup> = *m* − *p* − *q*, *m*<sup>+</sup> = *m*−, *m* = *p* − *q* to replace {*a*, *b*, *c*, *d*}, there are 11 effective formula:


**Corollary 1** *Type B invariants include 11 nontrivial expressions.*

*Proof* Only 0 item is a trivial one.

## **7 Two Combinatorial Formula and Quantitative Distributions**

From a combinatorial viewpoint, 1. item formula is a binomial coefficient *<sup>m</sup> p* , 0 ≤ *p* ≤ *m*, to show various partition properties with relevant parameters. For convenient illustration, two expressions are selected: {*m* − *p*}{*p*} and {2*q*}{*m* − 2*q*} from 2 clusters of 2b item of Type B.

# *7.1 Case I.* **{***m* **−** *p***}{** *p***}**

In combinatorics, the following identity for binomial coefficients:

$$
\binom{m+n}{r} = \sum\_{k=0}^{r} \binom{m}{k} \binom{n}{r-k}
$$

is Vandermonde's identity (or Vandermonde's convolution), for any nonnegative integers*r*, *m*, *n*. The identity is named after Alexandre-Théophile Vandermonde (1772), although it was already known in 1303 by the Chinese mathematician Zhu Shijie (Chu Shi-Chieh) [2, 3, 5, 12].

Applying Chu-Vandermonde's identity to identify {*m* − *p*}{*p*} as *f*<sup>1</sup> and *f*<sup>2</sup> in Eq. (18), the binomial coefficient in level *n* = 2 can be written as

$$
\binom{m}{p} = \sum\_{k=0}^{p} \binom{m-p}{k} \binom{p}{p-k} \tag{19}
$$

$$
= \sum\_{k=0}^{p} \binom{m-p}{k} \binom{p}{k}, 0 \le p \le m.
$$

In this way, each binomial coefficient *<sup>m</sup> p* is composed of *p* + 1 pairs of binomial coefficient multiplications and a total of sums on relevant groups.

**Theorem 1** *For all coefficients of Type B, sum of all coefficients in* {*m* − *p*}{*p*}, 0 ≤ *p* ≤ *m is equal to* 2*m.*

*Proof* Since

$$\forall m > 0, \sum\_{p=0}^{m} \binom{m}{p} = 2^m, \sum\_{k=0}^{p} \binom{m-p}{k} \binom{p}{k} = \binom{m}{p}.$$

so

$$\sum\_{p=0}^{m} \sum\_{k=0}^{p} \binom{m-p}{k} \binom{p}{k} = 2^m.$$

According to Theorem 1, all parameters of { *<sup>m</sup>*−*<sup>p</sup> k <sup>p</sup> k* } are distributed in (*m* + 1)<sup>2</sup> 2D array.

For e.g., while *m* = 10, all coefficients are in 11 × 11 region and nontrivial values are composed of a triangle shape with reflect symmetric properties on *p* values.

$$m > 0, 0 \le k, \; p \le m, \; \{f(m, p, k) = \binom{m - p}{k}\binom{p}{k}\}:$$


# *7.2 Case II.* **{2***q***}{***m* **− 2***q***}**

Applying Chu-Vandermonde's identity to identify {2*q*}{*m* − 2*q*} as *f*<sup>1</sup> and *f*<sup>2</sup> in Eq. (18), the binomial coefficient in level *n* = 2 can be written as

$$
\binom{m}{p} = \sum\_{k=0}^{p} \binom{2q}{k} \binom{m-2q}{p-k} \tag{20}
$$

$$
0 \le p \le m, 0 \le q \le \lfloor m/2 \rfloor
$$

By using this formula, it is possible to select a special *q* value in { 2*q k m*−2*<sup>q</sup> p*−*k* } to form *m*/2 + 1 2D coefficient distributions.

**Theorem 2** *For Type B* {2*q*}{*m* − 2*q*}, 0 ≤ *p* ≤ *m*, 0 ≤ *q* ≤ *m*/2 *equation, selecting a proper value of q, all coefficients are distributed in m*/2 + 1 *2D arrays and the sum of total coefficients in a 2D array is equal to* 2*m.*

*Proof* Since

$$\forall m > 0, 0 \le q \le \lfloor m/2 \rfloor, \binom{m}{p} = \sum\_{k=0}^p \binom{2q}{k} \binom{m-2q}{p-k} \& \sum\_{i=0}^m \binom{m}{p} = 2^m,$$

so

$$\sum\_{p=0}^{m} \sum\_{k=0}^{p} \binom{2q}{k} \binom{m-2q}{p-k} = 2^m$$

According to Theorem 2, { 2*q k <sup>m</sup>*−2*<sup>q</sup> p*−*k* } coefficients are distributed in *m*/2 + 1 levels of (*m* + 1) × (*m* + 1) 2D planes.

For e.g., while *m* = 10, all coefficients are arranged on 6 levels of 11 × 11 regions with multiple symmetric properties.

$$m > 0, \{ f(m, q, p, k) = \binom{2q}{k} \binom{m - 2q}{p - k} \} : 0 \le k, \, p \le m, \, 0 \le q \le \lfloor m/2 \rfloor^2$$


$$m = 10, q = 3: \begin{array}{c|cccccc} & f(10, 3, p, k) \mid 0 \ 1 \ 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 \ 10 \ p \\ \hline 10 & & & & & & & & & \\ \hline 9 & & & & & & & & & & \\ & 8 & & & & & & & & & \\ & 5 & & & & & & & & & \\ & 4 & & & & & & & & & \\ & 3 & & & 4 & 24 & 60 & 80 \ 60 \ 24 & 4 & \\ & 2 & & & 6 & 36 & 90 & 120 \ 90 \ 36 & 6 & \\ & 1 & & & 4 & 24 & 60 & 80 & 62 & 4 \\ & 0 & & 1 & 6 & 15 & 20 & 15 & 6 & 1 \\ \ldots & & & & & & & & & & \\ \end{array}$$


#### *7.3 Result Analysis*

Two formulas selected from 2b item of Type B show completely different properties. In Case I, for a given *m*, all coefficients are distributed in one triangle area with reflection properties on *p* direction.

However, Case II provides multiple levels of 2D distributions and each one is corresponding to a selected *q* value. From three listed conditions, *q* = 0 and *q* = 5 are linear structures, the first one is located on diagonal positions of the plane and the second one is located on *k* = 0, *p* = {0, 1,..., 10} a horizontal region. While 0 < *q* < 5, all distributions are shown in as parallelograms. Each line is shown in special symmetries. We can observe associated with variations of *q* values, horizontal projection keeps the same, however, the vertical projection will be changed from *q* = 0 binomial distribution, to be a pulse on *q* = *m*/2 condition. This type of controllable properties could be useful to explore future advanced applications.

#### **8 Conclusion**

A new approach to decompose binomial coefficients under permutative and associative operations is proposed. Using this approach, it is feasible to investigate four meta measures in global invariant spaces. The resulting set of 192 configurations is categorized into standard group theory mechanism. From a statistic viewpoint, Type A (Five levels in 16 clusters) and Type B (Five levels in 12 clusters) provide global identifications on complicated partitions on wider restrictions, further theoretical explorations and practical applications are deeply expected in the coming period.

**Acknowledgements** The author would like to thank Chris Zheng for refined clustering analysis on random sequences to open a new way in binomial expressions, Yifeng Zheng and Kaiyu Yang for generating binomial coefficients in different conditions and Dr. Dennis Heim for correction of the chapter.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Triangular Numbers and Their Inherent Properties**

**Chris Zheng and Jeffrey Zheng**

**Abstract** A method to classify one-dimensional binary sequences using three parameters intrinsic to the sequence itself is introduced. The classification scheme creates combinatorial patterns that can be arranged in a two-dimensional triangular structure. Projections of this structure contain interesting properties related to the Pascal triangle numbers. The arrangement of numbers within the triangular structure has been named "triangular numbers", and the essential parameters, elementary equation, and sequencing schemes are discussed as well as visualizations of sample distributions, special cases, and search results. We believe this to be a novel finding as sequences generated using this method are not contained in the On-Line Encyclopedia of Integer Sequences or OEIS.

**Keywords** Binary sequence · Classification · Combinatorial patterns · Triangular number · Elementary equation · Variant triangle

# **1 Introduction**

Additive number theory [7], the study of integer subsets and their behavior under addition, is a branch of mathematics related to combinatorics. The simplest constructs within this field are binomial coefficients [6]. The properties of binomial

This work was supported by the Key Project on Electric Information and Next Generation IT Technology of Yunnan (2018ZI002), NSF of China (61362014), Yunnan Advanced Overseas Scholar Project.

C. Zheng Tahto, Sydney, Australia e-mail: z@caudate.me

J. Zheng (B) Key Laboratory of Quantum Information of Yunnan, Yunnan University, Kunming, China e-mail: conjugatelogic@yahoo.com

J. Zheng Key Laboratory of Software Engineering of Yunnan, Yunnan University, Kunming, China

coefficients have been explored by many over the history of mathematics [8, 9]. One generalization of the binomial coefficient is the multinomial coefficient [5, 8, 9]. Any multinomial coefficient can be expressed as the products of multiple binomial coefficients:

$$
\binom{k\_1 + k\_2 + \cdots + k\_m}{k\_1, k\_2, \ldots, k\_m} = \binom{k\_1 + k\_2}{k\_1} \binom{k\_1 + k\_2 + k\_3}{k\_1 + k\_2} \ldots \binom{k\_1 + k\_2 + \cdots + k\_m}{k\_1 + k\_2 + \cdots + k\_{m-1}}.\tag{1}
$$

For this type of expansions, the simplest is the trinomial coefficient [10–13]:

$$
\binom{r}{k, m-k, r-m} = \binom{r}{m} \binom{m}{k} \,. \tag{2}
$$

#### *1.1 Geometric Arrangement of Combinatorial Data*

In discrete geometry [2], as the most basic 2D shape, triangular patterns are found in such series as combinatorial triangle A102639, differential triangle A194005 [1], additive triangle A035312, and Pascal triangle A007318 [8, 9, 11, 13].

This chapter proposes a novel method of classification of binary sequences that is shown to be combinatorial properties in nature. By using a simple basis of binary (0–1) sequences and applying simple classification rules, a triangular structure can be generated. The set of results has been named "Generative Triangular Numbers". The term generative [3] is used to describe the technique of using a simple input and a repeatedly applied process, creating emergent properties through repetition. Generative science [4] is a multidisciplinary science that explores the natural world and its complex behaviors as a generative process. Generative approaches can be used to simulate describe behaviors in fractals, cellular automata, and various nonlinear systems.

The generated patterns are not currently found in the On-Line Encyclopedia of Integer Sequences (OEIS) potentially making them an interesting area for further research.

#### *1.2 Previous Work*

The current scheme is a derivative of the work of Zheng et al. [16, 17] to organize 1D 0–1 sequences as certain *N* > 1 length vectors using three parameters in variant measurement construction and classifications on hierarchical discrete phase spaces in general.

A trinomial equation is proposed as an elementary equation using three control parameters {*q*, *p*, *N*} [14, 15] to describe 0–1 vectors of *N* length as a subgroup, where *N* is the length of a vector, *p* indicates the number of elements with 1 values, and *q* records the number of changes from either 0–1 or 1–0 as the vector in a circular form to form a 2D array with nontrivial triangular numbers. This type of elementary equation can be generatively applied to make relevant triangular numbers as a geometric distribution to form a hierarchical 3D array generatively. Based on this hierarchical 3D array, different integer sequences can be observed from this type of generative triangular numbers, and one projection on *p* direction is collected by Vandermonde's identities to show their correspondences to standard binomial coefficients. Main results are provided by algorithms, theorems, and corollaries. Sample cases are illustrated and possible meanings are discussed.

#### **2 Definitions and Sample Cases**

#### *2.1 Definitions*

**Definition 1** Let *X* be a 0–1 vector, *X* = *xN*−<sup>1</sup> ... *xi* ... *x*<sup>0</sup> with *N* elements as a state, *xi* ∈ {0, 1}, 0 ≤ *i* < *N*.

**Definition 2** Let Ω(*N*) denote a vector space contained all 0–1 vectors of N length Ω(*N*) = {∀*X*|0 ≤ *X* < 2*<sup>N</sup>* } as an initial data set.

**Definition 3** Let *<sup>n</sup> k* be a binomial coefficient, it satisfies

$$
\binom{n}{k} = \begin{cases} 1, & \text{if } n = k; \\ 0, & \text{if } n \neq k, k > n \text{ or } k < 0; \\ \frac{n!}{k! \times (n-k)!}, & \text{otherwise.} \end{cases} \tag{3}
$$

Under this condition, |Ω(*N*)| = 2*<sup>N</sup>* forms a vector space with *N* length, respectively.

**Definition 4** For any selected vector *X* ∈ Ω(*N*), *p*(*X*) can be determined by

$$p(X) = \sum\_{i=0}^{N-1} x\_i, x\_i \in \{0, 1\}. \tag{4}$$

**Lemma 1** *For a vector space* Ω(*N*)*, p provides a complete partition on a subgroup and the number of vectors in the subgroup is a binomial coefficient.*

*Proof* For a given *p*, 0 ≤ *p* ≤ *N*, its combinatorial property makes a total number of *<sup>N</sup> p* <sup>=</sup> *<sup>N</sup>*! *<sup>p</sup>*!×(*N*−*p*)! vectors identified to partition the vector space Ω(*N*).

**Definition 5** For a circular vector *X* ∈ Ω(*N*), *q*(*X*) can be determined by

$$q(X) = \sum\_{0 \le i \prec N} (\mathbf{x}\_i \equiv 0) \& (\mathbf{x}\_{i+1} \equiv 1); \mathbf{x}\_i, \mathbf{x}\_{i+1} \in \mathbf{0}, 1, (i+1) \bmod (N), \tag{5}$$

e.g., *N* = 10, *X* = 1110011001, *p*(*X*) = 6(*i* = {0, 3, 4, 7, 8, 9}); *q*(*X*) = 2(*i* = {2, 6}).

#### *2.2 Sample Cases*

Under this construction, any selected vector can be evaluated by the three parameters. Applying this set of parameters to create subgroups, interesting inner structures can be identified. That is, *N* = 4, all 16 vectors in the vector space, can be distinguished as six subgroups as a pair of (*q*, *p*) values shown in Table 1.

Each subgroup is linked to their corresponding vectors in Table 2

Enumeration numbers of relevant subgroup numbers are shown in Table 3.

**Table 1** Six subgroups for *N* = 4 vector space in (*q*, *p*) partitions


**Table 2** Six subgroups, vectors, and enumerating numbers


**Table 3** *N* = 4, (*q*, *p*) subgroup numbers and a projection



**Table 4** Six levels of binomial coefficients and generative triangular numbers

From Table 3, it is easy to verify that 16 vectors are sum of all possible numbers from six subgroups. Subgroup sequences of all numbers are as the same as *N* = 4 binomial coefficients. Applying this corresponding from *N* = 1–6, six rows of original binomial coefficients can be created generatively as three-dimensional organization and each row {*p*, *N*} sequence corresponds a (*q*, *p*) triangular shape, respectively, shown in Table 4.

This type of relationship can be expanded on generative mechanism from special cases of *N* = 1–6 to general conditions for any given *N* value. The detailed generative triangular mechanism is described in the next section.

#### **3 Elementary Equations**

**Definition 6** Let *f* (*q*, *p*, *N*) denote a function for generative triangular numbers 0 ≤ *p* ≤ *N*, 0 ≤ *q* ≤ *N*/2, for two initial and end subgroups *p* = {0, *N*}, *q* = 0, let two functions of subgroups be *f* (0, 0, *N*) = *f* (0, *N*, *N*) = 1.

For other subgroups, each case 0 < *p* < *N*, 0 < *q* ≤ *N*/2 is a subgroup under a given condition. Elementary equation of generative triangular numbers is proposed to use binomial coefficient expression in Eq. 6.

$$f(q, p, N) = \frac{N}{N - p} \binom{N - p}{q} \binom{p - 1}{q - 1}.\tag{6}$$

**Table 5** *N* = 5, *f* (*q*, *p*, 5) subgroup numbers


Using this elementary equation, the list of values can be verified. For example, *<sup>f</sup>* (1, <sup>1</sup>, <sup>5</sup>) <sup>=</sup> <sup>5</sup> 4 4 1 0 0 <sup>=</sup> <sup>5</sup>; *<sup>f</sup>* (2, <sup>3</sup>, <sup>5</sup>) <sup>=</sup> <sup>5</sup> 2 2 2 2 1 <sup>=</sup> <sup>5</sup>;... *<sup>f</sup>* (2, <sup>4</sup>, <sup>5</sup>) <sup>=</sup> <sup>5</sup> 1 1 2 3 1 = 0. All { *f* (*q*, *p*, 5)} calculations are listed in Table 5.

**Corollary 1** *The elementary equation has equivalent identities on a pair of*{*p*, *N* − *p*}*.*

$$\begin{split} f(q,p,N) &= \frac{N}{N-p} \binom{N-p}{q} \binom{p-1}{q-1} \\ &= \frac{N}{N-(N-p)} \binom{N-(N-p)}{q} \binom{(N-p)-1}{q-1} \\ &= f(q,N-p,N). \end{split} \tag{7}$$

*Proof* Using the elementary equation, we have

$$\begin{split} f(q,p,N) &= \frac{N}{N-p} \binom{N-p}{q} \binom{p-1}{q-1}; \text{ (equation 6)}\\ &= \frac{N}{(N-p)} \frac{(N-p)!}{(N-p-q)!q!} \binom{p-1}{q-1}\\ &= \frac{N}{q} \frac{(N-p-1)!}{(N-p-q)!(q-1)!} \binom{p-1}{q-1}\\ &= \frac{N}{q} \binom{N-p-1}{q-1} \binom{p-1}{q-1}\\ &= \frac{N}{q} \binom{p-1}{q-1} \binom{N-p-1}{q-1}\\ &= \frac{N}{p} \binom{p}{q} \binom{N-p-1}{q-1}\\ &= \frac{N}{N-(N-p)} \binom{N-(N-p)}{q} \binom{(N-p)-1}{q-1}; \text{ (equation 7)}\\ &= f(q,N-p,N). \end{split}$$

*p* parameters are in the vertical direction. In general condition for any given *N*, triangular numbers can be arranged in Table 6 (Fig. 1).

**Table 6** *N* = 5, *f* (*q*, *p*, 5) subgroup numbers in vertical direction


$$\begin{array}{ccccc} f(0,0,N) & & \\ & f(1,1,N) & \\ & \cdots & \cdots & \\ & f(1,q,N) & \cdots & f(q,q,N) \\ & \cdots & \cdots & \cdots \\ & \cdots & \cdots & f(\lfloor \lfloor \frac{N}{2} \rfloor, \lfloor \frac{N}{2} \rfloor, N)) \\ & \cdots & \cdots & f(\lfloor \frac{N}{2} \rfloor, \lceil \frac{N}{2} \rceil, N) \\ & f(1,p,N) & \cdots & f(q,p,N) & \cdots \\ & \cdots & \cdots & \cdots \\ & f(1,N-q,N) & \cdots & f(q,N-q,N) \\ & \cdots & \cdots & \\ & f(1,N-1,N) & \\ & \cdots & \cdots & \cdots \\ & f(0,N,N) & \cdots & f(1,N-1,N) \end{array}$$

$$0 \le q \le \lfloor \frac{N}{2} \rfloor, 0 \le p \le N$$

**Fig. 1** Triangular numbers for a given *N* > 1

#### **4 Local Propensities**

It is necessary to investigate different relationships for symmetry properties from the elementary equations to distinguish functions for generative triangular numbers.

#### *4.1 Nontrivial Areas*

**Corollary 2** (A pair of symmetric properties) *In either* 0 < *q* ≤ *p* ≤ *N* − *q or q* = 0, *p* = {0, *N*}*, a pair of nontrivial trinomial coefficients on triangular numbers satisfies*

$$f(q, p, N) = f(q, N - p, N). \tag{8}$$

*Proof* Using the elementary equation, two cases are required.

Case 1: If *q* > 0, Eqs. 6 and 7 provide relevant combinatorial identities. Case 2: If *q* = 0, we have *f* (0, 0, *N*) = *f* (0, *N*, *N*) = 1 by Definition 6.

#### *4.2 Trivial Areas*

**Corollary 3** (Five areas for trivial values) *If case 1—q* > 0, 0 < *p* < *q; case 2— N* − *q* < *p* < *N ; case 3—q* = 0, 0 < *p* < *N ; case 4—q* > 0, *p* = 0*; case 5—q* > 0, *p* = *N, then*

$$f(q, p, N) = 0.\tag{9}$$

*Proof* For cases 1, 2 and 3, we have

$$\begin{split} f(q,p,N) &= \frac{N}{N-p} \binom{N-p}{q} \binom{p-1}{q-1} \\ &= \frac{N}{N-p} \binom{N-p}{q} \left[ \binom{p-1}{q-1} = 0 \right], 0 < p < q \ &\ \ \ Case \ 1 \\ &= \frac{N}{N-p} \left[ \binom{N-p}{q} = 0 \right] \binom{p-1}{q-1}, N-q < p < N \ &\ \ \ Case \ 2 \\ &= \frac{N}{N-p} \binom{N-p}{0} \left[ \binom{p-1}{-1} = 0 \right], q = 0, 0 < p < N \ &\ \ \ Case \ 2 \\ &= 0. \end{split}$$

For cases 4 and 5, we have

$$\begin{aligned} f(q,p,N) &= \frac{N}{q} \binom{N-p-1}{q-1} \binom{p-1}{q-1} \\ &= \frac{N}{q} \binom{N-1}{q-1} \left[ \binom{-1}{q-1} = 0 \right], q > 0, p = 0 \; : \; Case 4 \\ &= \frac{N}{q} \left[ \binom{-1}{q-1} = 0 \right] \binom{N-1}{q-1}, q > 0, p = N \; : \; Case 5 \\ &= 0. \end{aligned}$$

#### **5 Projection Properties**

#### *5.1 Linear Projection*

In this section, the algebraic properties of linear projection are investigated.

**Definition 7** Let *L*(*p*, *N*) denote a function as a linear projection to collect all possible values for a given *p*, 0 ≤ *p* ≤ *N*.


**Table 7** *N* = 5, *f* (*q*, *p*, 5) subgroup numbers and two projections

For the case of *N* = 5, two projections and their generative triangular numbers are shown in Table 7, respectively.

Following theorems and corollaries are claimed.

**Theorem 4** *If L*(*p*, *<sup>N</sup>*) <sup>=</sup> *<sup>p</sup> <sup>q</sup>*=<sup>1</sup> *f* (*q*, *p*, *N*), 0 < *p* < *N, then the projection function L*(*p*, *N*) *is a binomial coefficient and*

$$L(p, N) = \binom{N}{p} \,. \tag{10}$$

*Proof* For a fixed *p*, 0 < *p* < *N*, all possible { *f* (*q*, *p*, *N*)} are collected to form the following equation:

$$\begin{aligned} L(p,N) &= \sum\_{q=1}^{p} f(q,p,N) \\ &= \sum\_{q=1}^{p} \frac{N}{N-p} \binom{N-p}{q} \binom{p-1}{q-1} \\ &= \frac{N}{N-p} \sum\_{q=1}^{p} \binom{N-p}{q} \binom{p-1}{q-1} \\ &= \frac{N}{N-p} \sum\_{q=1}^{p} \binom{N-p}{q} \binom{p-1}{p-q}; \quad \binom{n}{k} = \binom{n}{n-k} \\ &= \frac{N}{N-p} \binom{N-1}{p}; \quad \binom{x+y}{n} = \sum\_{k=0}^{n} \binom{x}{k} \binom{y}{n-k} \\ &= \frac{N}{(N-p)} \frac{(N-1)!}{(N-p-1)!p!} \end{aligned}$$

$$\begin{aligned} &= \frac{N!}{(N-p)!p!} \\ &= \binom{N}{p} .\end{aligned}$$

For a complete sequence of binomial coefficients, it is necessary to include both initial and end subgroups. Further corollaries can be established.

**Corollary 5** *For any given N* > 0 *under the listed condition, a set of projection function* {*L*(*p*, *N*)}, 0 ≤ *p* ≤ *N is composed of the same sequence of binomial coefficients*

$$L(p, N) = \binom{N}{p} \, . \tag{11}$$

*Proof* For 0 < *p* < *N* condition, they are well determined by Theorem 5.1 and two end subgroups *p* = {0, *N*}, *N* 0 <sup>=</sup> *<sup>N</sup> N* = 1 by defined initial conditions.

**Corollary 6** *The sum of all possible* {*L*(*p*, *<sup>N</sup>*)}*<sup>N</sup> <sup>p</sup>*=<sup>0</sup> *is*

$$\sum\_{p=0}^{N} L(p, N) = 2^N. \tag{12}$$

*Proof* Collecting all possible numbers by Corollary 2, we have

$$\begin{aligned} \sum\_{p=0}^{N} L(p, N) &= \sum\_{p=0}^{N} \binom{N}{p} \\ &= (1 + 1)^N \\ &= 2^N. \end{aligned}$$

**Corollary 7** *For* 0 ≤ *p* ≤ *N, a pair of functions has an equivalent formula*

$$L(p, N) = L(N - p, N). \tag{13}$$

*Proof* By Corollary 2.1, both equations are equal.

**Theorem 8** *For any N* > 0*, the sum of all possible functions on* { *f* (*q*, *p*, *N*}∀*<sup>p</sup>*,∀*<sup>q</sup> or* {*L*(*p*, *<sup>N</sup>*)}*<sup>N</sup> <sup>p</sup>*=<sup>0</sup> *is equal to* <sup>2</sup>*<sup>N</sup>*

$$\sum\_{\forall q} \sum\_{\forall q} f(q, p, N) = \sum\_{p=0}^{N} L(p, N) = 2^N. \tag{14}$$

*Proof* By Corollary 6, two equations are equal.

#### *5.2 Triangular Sequence*

**Definition 8** For a given *N* ≥ 1, let *T* (*N*) denote a 2D structure with all nontrivial triangular numbers.

$$T(N) = \{ f(q, p, N) | f(q, p, N) > 0, \, 0 \le q \le \lfloor N/2 \rfloor, \, 0 \le p \le N \} \quad (15)$$

**Corollary 9** *For a given N, if* |*T* (*N*)| *be a total number of distinguishable elements for nontrivial triangular numbers, then* |*T* (*N*)| *has the following equation:*

$$|T(N)| = \begin{cases} N^2/4 + 2; & N \equiv 0, \pmod{2} \\ (N^2 - 1)/4 + 2; & N \equiv 1, \pmod{2}. \end{cases} \tag{16}$$

*Proof* By Corollary 2 for a given *N*, a triangular shape for nontrivial members is composed of two parts: a triangular area and two *q* = 0 points. The triangular area has (*N* − 1) length and *N*/2 high. If *N* ≡ 0,(mod 2), the triangular area is a regular triangle contained *N*<sup>2</sup>/4 elements, so the total number of the generative triangular shape is *N*<sup>2</sup>/4 + 2. For an odd valued *N*, a triangular area has additional *N*/2 members side on a regular triangle with *N*/2<sup>2</sup> elements, so the total number of elements is *N*/2<sup>2</sup> + *N*/2 + 2 = (*N*<sup>2</sup> − 1)/4 + 2.

**Definition 9** For a given *N* ≥ 1, let *T S*(*N*) denote an integer sequence with |*T* (*N*)| elements for all nontrivial triangular numbers in *T* (*N*)

$$TS(N) := \{ f(0, 0, N), f(0, N, N), \dots, \\
\dots \\
\}$$

$$\dots, f(q, q, N), \dots, f(q, p, N), \dots, f(q, N - q, N), \dots, \\
\dots,$$

$$\dots, f(\lfloor N/2 \rfloor, \lfloor N/2 \rfloor, N), f(\lfloor N/2 \rfloor, \lceil N/2 \rceil, N)],$$

$$1 \le q \le \lfloor N/2 \rfloor, \ q \le p \le N - q. \tag{17}$$

#### *5.3 Linear Sequence*

**Definition 10** For a given *N* ≥ 1, let *L*(*N*) denote a 1D structure with relevant linear numbers.

$$L(N) = \{ L(p, N) | 0 \le p \le N \} \tag{18}$$

**Corollary 10** *For a given N, if* |*L*(*N*)| *be a total number of distinguishable elements for linear numbers, then* |*L*(*N*)| *satisfies Eq.* 19*.*

$$|L(N)| = N + 1\tag{19}$$


**Table 8** {*T* (4), *T* (5), *T* (6)},{*L*(4), *L*(5), *L*(6)} subgroup numbers in three levels

**Definition 11** For a given *N* ≥ 1, let *L S*(*N*) denote an integer sequence with |*T* (*N*)| elements for all linear numbers in *L*(*N*) (Table 8)

$$LS(N) := [L(0, N), \dots, L(p, N), \dots, L(N, N)], \ 0 \le p \le N. \qquad (20)$$

From the listed six groups of {*T* (4), *T* (5), *T* (6)} and {*L*(4), *L*(5), *L*(6)} structures, two integer sequences are arranged as follows:

*T S*(4), *T S*(5), *T S*(6) := [1, 1, 4, 4, 4, 2, 1, 1, 5, 5, 5, 5, 5, 5, 1, 1, 6, 6, 6, 6, 6, 9, 12, 9, 2]; *L S*(4), *L S*(5), *L S*(6) := [1, 4, 6, 4, 1, 1, 5, 10, 10, 5, 1, 1, 6, 15, 20, 15, 6, 1].

#### **6 Sample Cases**

Two sample cases are selected for *N* = {17, 18} to show their triangular numbers and generative structures in Table 9. In relation to relevant integer sequences, both {*L*(16), *L*(17)} and {*T* (16), *T* (17)} are shown in Table 9. Two integer sequences are significantly different. The triangular number sequence in this case with a total length of 140 integers is three times longer than the linear number sequence with a total


**Table 9** Triangular number arrays for *N* = {16, 17} cases

length of 35 integers. Two integer sequences represent different partition results on the same number 196608 = 2<sup>16</sup> + 2<sup>17</sup> for generative binomial and trinomial coefficients, respectively.

#### **7 Conclusion**

Due to the proposed elementary equation of trinomial coefficients with excellent symmetric properties on a 2D grid similar to binomial coefficients on a 1D line, projecting operation makes 2D *T* (*N*) array be 1D linear *L*(*N*) array, respectively. Two types of *T S*(*N*) and *L S*(*N*)integer sequences can be generated. As the simplest expansion of multinomial coefficients, discrete 2D geometry could provide solid combinatorial foundation to support multinomial explorations.

From a combinatorial geometry viewpoint, triangular numbers provide a key construction to link between trinomial and binomial representation in mathematical foundation. Trinomial integer sequences, as representatives, need to be deeply explored by modern combinatorial & discrete mathematical societies. Further explorations are expected on detailed analysis and systematic construction on both and practical applications.

**Acknowledgements** Both authors would like to thank Mr. Zhonghao Yang for his contribution to work on sample sequences, sincerely to gratitude @Qiaochu Yuan for the suggestion of combinatorial description and @Zander to provide a set of combinatorial equations to answer @zcaudate's question [18] in 2012.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Symmetric Clusters in Hierarchy with Cryptographic Properties**

**Jeffrey Zheng**

**Abstract** Symmetric Boolean functions play a key role in stream ciphers. Symmetric constructions provide core components in cryptographic applications. In this chapter, four meta symmetric clustering schemes (combination, crossing, variant and rotation) are organized in a hierarchy for *n* variables of 0–1 vectors in measuring phase spaces. Local counting properties in a cluster and global counting properties in a given level are formulated. From selected symmetric clusters, a number of various symmetric Boolean functions are formulated. Counting properties on symmetric clusters, vectors in selected clusters and special symmetric Boolean functions are listed. Four sets of symmetric Boolean functions are compared. Properties of symmetric clusters and Boolean functions are discussed. Main results are expressed in theorems and tables. Among four meta schemes, the variant scheme presents novel properties approximately with *O* - *n*<sup>2</sup>/4 clusters on a 2D phase space different from other schemes: combinatorial *O* (*n*), crossing *O* (*n*/2) and rotation *O* (2*<sup>n</sup>*/*n*) on 1D measuring phase spaces, respectively. The variant pseudorandom number generator is a similar approach on RC4 and HC128 stream ciphers using word-oriented 0–1 vectors. Further advanced researches and explorations on relevant optimal configurations are required.

**Keywords** Symmetric construction · Meta symmetric Cluster · hierarchy Boolean function · Four meta schemes · Phase space

J. Zheng (B)

© The Author(s) 2019 J. Zheng (ed.), *Variant Construction from Theoretical Foundation to Applications*, https://doi.org/10.1007/978-981-13-2282-2\_5

This work was supported by the Key Project on Electric Information and Next Generation IT Technology of Yunnan (2018ZI002), NSF of China (61362014), Yunnan Advanced Overseas Scholar Project.

Key Laboratory of Software Engineering of Yunnan, Kunming, China e-mail: conjugatelogic@yahoo.com

#### **1 Introduction**

Symmetric Boolean functions [5] have being widely used as components of different cryptosystems [25] (e.g. in stream ciphers, block ciphers or hash functions). In combinatorial mathematics [10], a symmetric Boolean function is a Boolean function whose value does not depend on the permutation of its input bits [4], i.e. it depends only on the number of ones in the input on *n* variables of 0–1 vectors [21]. A total of 2*<sup>n</sup>* vectors are composed of a vector space or a phase space for the construction [19]. For a specific symmetric Boolean function, it is necessary to have invariant properties undertaken a special group of permutations [18]. For example, rotation symmetric Boolean functions are invariant under the circular translation of indices. In addition to rotation symmetric properties, multiple invariants (combination, crossing, reflection, translation) may be composed of various symmetric subgroups of permutations [10, 22]. Various combinatorial counting schemes are explored [34–36].

#### *1.1 Symmetric Functions—Combinatorial Invariant*

From a combinatorial viewpoint, symmetric Boolean functions are a combinatorial invariant that links to the number of one elements *p*, 0 ≤ *p* ≤ *n* in a vector [35]. In combinatorics, this type of function has being linked to binomial coefficients, and normally, there are *n* + 1 partitions to distinct the parameter of a measuring phase space into various clusters [30]. Symmetric Boolean functions are characterized [36] by the fact that their outputs only depend on the *p* numbers of their inputs. The usefulness of symmetric functions in a cryptographic context has being widely explored which possess good cryptographic properties [6, 7].

#### *1.2 Crossing Number - Topological Invariant*

A zero-crossing [23] describes a point where the sign of a mathematical function changes (e.g. from positive to negative), represented by a crossing of the axis (zero value) in the graph of the function. It is a commonly used term in electronics, mathematics, sound and image processing.

From a measuring viewpoint, a 0–1 vector with *n* bits can be expressed as a circular ring that has a fixed crossing number *<sup>q</sup>*, <sup>0</sup> <sup>≤</sup> *<sup>q</sup>* ≤ *<sup>n</sup>* <sup>2</sup> distinguished a number of derivative changes on either 0–1 or 1–0, respectively. This type of derivative invariant is widely used in crypto-analysis for many years. In NIST random data testing packages [1], binary derivative [3] and Runs tests [2] play an important role to measure the randomness of a binary sequence formed by a pseudorandom number generator for use in cipher systems. From an analytic viewpoint, this parameter is a topological invariant and different from a combinatorial invariant to provide another type of partition capacities to organize a set of clusters in a measuring phase space.

#### *1.3 Rotation Symmetric Functions - Geometric Invariant*

In combinatorial mathematics, rotation symmetric properties are widely explored from early stage of abstract group theories and symmetric group constructions [10, 22] as a geometric invariant. Filiol and Fontaine [12] were initially explored on balanced Boolean functions with a good correlation immunity. Pieprzyk and Qu [26] were applied in crypto-applications to use Rotation Symmetric Boolean Functions (RSBF) as components in the rounds of a hashing algorithm.

Extensive R&D activities on RSBF are continuous for last decades, a list of advanced works explored, such as degree and non-linearity [6], optimal algebraic immunity [7], bent and semi-bent functions [8, 33], non-linearity of resilient, nonlinear Boolean functions [20, 28], balanced Boolean functions [12, 16], non-linear balanced Boolean functions [31], weights and non-linearity [11], immune combining functions [32], count and cryptographic properties [13, 29], etc.

#### *1.4 Trinomial Coefficients*

It is a natural approach [10, 18, 19] to apply binomial coefficients to partition a measuring phase space on 0–1 vector sets. However, when parameters increase more than three, a generalization [34–36] using multinomial coefficients may not provide a general solution on further refined partitions, if the processed phase space is composed of 0–1 vectors. It is convenient for us to use a trinomial expression to show this fact.

Let *n* = *n*<sup>1</sup> + *n*<sup>2</sup> + *n*3, 0 < *n*,

$$
\binom{n}{n\_1, n\_2, n\_3} = \frac{n!}{n\_1! n\_2! n\_3!},
$$

collecting all possible trinomial coefficients, we have

$$\sum\_{\forall n\_1, n\_2, n\_3} \binom{n}{n\_1, n\_2, n\_3} = \mathfrak{J}^n \neq \mathfrak{Z}^n. \tag{1}$$

From Eq. 1, it is interesting to notice that trinomial coefficients provide further segments to partition three-valued 0–2 vectors. Due to this reason, extensions using multinomial coefficients may not be directly relevant to binary-valued 0–1 vector sets. Refined identity equations of combinatorics are required [14, 15].

#### *1.5 Variant Symmetric Schemes - Variant Invariants*

Various schemes to use multiple invariants to partition special phase spaces have being explored in binary image analysis and processing for many years. In 1990s, Zheng [39, 40] proposed conjugate classifications to apply seven invariants in a hierarchy to partition the kernels of four regular plane lattices on *n* = {4, 5, 7, 9} cases for 2D binary images. For *n*-tuple 0–1 vectors, variant logic frameworks [41, 42] are proposed in 2010s, various applications are explored, such as 3D visual method [37], variant Pseudorandom Number Generator (PRNG) [38, 43], computational simulation on quantum interactions [44–47] and non-coding DNA analysis [48–50].

#### *1.6 Organization of the Chapter*

In this chapter, an algebraic equation of variant trinomial will be proposed as a kernel structure to arrange a hierarchical phase space. This extension provides a general framework of multiple symmetric operations to support three numeric numbers: combinatorial, crossing and variant in a hierarchy. Three meta clusters of measuring phase spaces are identified by the three invariants: {*n*, *p*, *q*} and their combinations. Refined levels can be compared with the rotation symmetric scheme under *n* = {1, 2, 3, 4, 5} conditions. Similarities and differences among the four schemes are explored.

In Sect. 2, symbols and local counting properties of symmetric clusters in measuring spaces are defined, algebraic equations are formulated and two important projections are discussed. In Sect. 3, variant symmetric clusters and their elementary equation are proposed. In Sect. 4, four number sets of symmetric clusters are explored from a global viewpoint. In Sect. 5, symmetric Boolean functions of selected clusters are constructed and both algebraic and approximate numeric properties are discussed. In Sect. 6, cryptographic properties of symmetric Boolean functions in a hierarchy are discussed and special properties on the variant scheme are stressed. Section 7 is the conclusion of the chapter. Main results of the chapter are expressed in a list of theorems and corollaries in Sects. 2–5, respectively.

#### **2 Symmetric Clusters in Measuring Phase Spaces**

In this section, basic symbols, primary definitions and algebraic formulas are defined for different clusters in their measuring phase spaces.

#### *2.1 Basic Symbols*

Main symbols in this chapter are listed in Table 1.

#### *2.2 Primary Definitions*

**Definition 1** (*x an n*-*tuple vector on 0–1 variables*) Let *x* be a 0–1 vector with *n* length.

$$\mathbf{x} = (\mathbf{x}\_{n-1}, \dots, \mathbf{x}\_i, \dots, \mathbf{x}\_0), \mathbf{0} \le i < n, \mathbf{x}\_i \in \{0, 1\} = B\_2, \mathbf{x} \in B\_2^n,\tag{2}$$

e.g. *x* = 110010, *n* = 6.

**Table 1** Basic symbols


**Definition 2** (*I index for a vector x*) For a vector *x*, let *I* or *I*(*x*) be an index:

$$I = I(\boldsymbol{\alpha}) = \sum\_{i=0}^{n-1} \boldsymbol{x}\_i \ast \mathbf{2}^i,\tag{3}$$

e.g. *x* = 110010, *I*(*x*) = 2<sup>5</sup> + 2<sup>4</sup> + 2 = 32 + 16 + 2 = 50.

**Definition 3** (Ω(*n*) *a full set of n*-*tuple 0–1 vectors*) Let Ω(*n*) be a vector space or a phase space of all *n*-tuple 0–1 vectors,

$$\mathcal{Q}(n) = \{ \forall x | 0 \le I < 2^n, x \in B\_2^n \} \text{ and } \mathcal{Q}(n) = B\_2^n. \tag{4}$$

**Definition 4** Let *f*<sup>Ω</sup> (*n*) denote a number of vectors in Ω(*n*).

**Lemma 1** *f*<sup>Ω</sup> (*n*) *is equal to* 2*n.*

*Proof* For a vector *<sup>x</sup>* <sup>∈</sup> *<sup>B</sup><sup>n</sup>* <sup>2</sup> from 0 ... 0 to 1 ... 1, its index *I* can cover a full region of 0 ≤ *I* < 2*<sup>n</sup>*, so Ω(*n*) contains 2*<sup>n</sup>* distinct vectors and *f*<sup>Ω</sup> (*n*) = 2*<sup>n</sup>*.

**Definition 5** (*Measuring Phase Space*) If a phase space can be organized by various invariants, then it is a measuring phase space and its dimension is determined by a number of active invariants.

**Corollary 1** *For any n* > 0*,* Ω(*n*) *is a measuring phase space in zero dimension.*

*Proof* For any *n* > 0, Ω(*n*) is composed of one cluster of vectors as a single point.

**Definition 6** (*R rotation operator*) Let *R*(*x*;*r*) be a rotation operator on a vector *x* rotation −*n* < *r* < *n* positions:

$$\begin{split} \mathcal{R}(\mathbf{x}; r) &= \mathcal{R}(\mathbf{x}\_{n-1}, \dots, \mathbf{x}\_{i}, \dots, \mathbf{x}\_{0}; r) \\ &= (\mathbf{x}\_{n-1+r \mod n}, \dots, \mathbf{x}\_{i+r \mod n}, \dots, \mathbf{x}\_{0+r \mod n}), \end{split} \tag{5}$$

e.g. *<sup>x</sup>* <sup>=</sup> <sup>110010</sup>,{*R*(*x*;*r*)}<sup>5</sup> *<sup>r</sup>*=<sup>0</sup> = {110010, <sup>100101</sup>, <sup>001011</sup>, <sup>010110</sup>, <sup>101100</sup>, <sup>011001</sup>} with six distinct vectors.

**Lemma 2** (Maximal cyclic structure) *Initially from any vector x under a rotation operator, at most n distinct vectors will be distinguished under the rotation operator.*

*Proof* From any *x*, a set of {*R*(*x*;*r*)} *n*−1 *<sup>r</sup>*=<sup>0</sup> with *n* vectors can be generated. If the listed set of *n* vector sequences contains more than one cycle, then the number of distinct vectors will be less than *n*.

For example, *x* = 110110,{*R*(*x*;*r*)}<sup>5</sup> *<sup>r</sup>*=<sup>0</sup> = {110110, 101101, 011011, 110110, 101101, 011011} with only a set of three distinct vectors:{110110, 101101, 011011}. **Definition 7** (*F reflection operator*) Let *F*(*x*) be a reflect operator,

$$F(\mathbf{x}) = F(\mathbf{x}\_{n-1}, \dots, \mathbf{x}\_i, \dots, \mathbf{x}\_0) = (\mathbf{x}\_0, \dots, \mathbf{x}\_i, \dots, \mathbf{x}\_{n-1}), 0 \le i < n. \tag{6}$$

**Lemma 3** (A pair of reflections) *For any vector x, only two results are distinguished under F*(*x*) *operation: (1) F*(*x*) = *x; (2) F*(*x*) = *x.*

*Proof* (1) If *F*(*x*) = *x*, then the values of the vector *x* are distributed as a central symmetric form; (2) if *F*(*x*) = *x*, then the vector *x* does not have a symmetric distribution.

For example, *x* = 110010, *F*(*x*) = 010011; *y* = 110011, *F*(*y*) = 110011.

**Definition 8** (*p number of one elements*) Let *p* or *p*(*x*) be a number of one elements in *x*,

$$p = p(\boldsymbol{x}) = \sum\_{i=0}^{n-1} \boldsymbol{x}\_i, \boldsymbol{0} \le p \le n. \tag{7}$$

For example, *x* = 110010, *p*(*x*) = 3; *y* = 110011, *p*(*y*) = 4.

**Definition 9** (*q number of cyclic crossings*) Let *q* or *q*(*x*) be a number of cyclic crossings either 0–1 or 1–0 in a vector *x*,

$$q = q(\mathbf{x}) = \sum\_{0 \le i < n} (\mathbf{x}\_i \equiv \mathbf{0}) \& (\mathbf{x}\_{i+1} \equiv \mathbf{1}); \mathbf{x}\_i, \mathbf{x}\_{i+1} \in \mathcal{B}\_2, (i+1) \mod n;$$

$$= \sum\_{0 \le i < n} (\mathbf{x}\_i \equiv \mathbf{0}) \& (\mathbf{x}\_{i-1} \equiv \mathbf{1}); \mathbf{x}\_i, \mathbf{x}\_{i-1} \in \mathcal{B}\_2, (i-1) \mod n;$$

$$\mathbf{0} \le q \le \lfloor \frac{n}{2} \rfloor. \quad (8)$$

For example, *x* = 110010, *q*(*x*) = 2; *y* = 110011, *q*(*y*) = 1.

#### *2.3 Counting Properties on Rotation Clusters*

**Definition 10** (*G*(*m*, *n*) *m*-*th rotation symmetric cluster*) Let *G*(*m*, *n*) be an *m*-th rotation symmetric cluster of vectors, *G*(*m*, *n*) = Ω(*n*|*m*) ⊂ Ω(*n*) in Ω(*n*), and let a total number of rotation symmetric clusters be *CG*(*n*), 1 ≤ *m* ≤ *CG*(*n*),

$$\mathcal{Q}(n) = \bigcup\_{m=1}^{\mathcal{C}\_{\mathcal{G}}(n)} \mathcal{Q}(n|m) = \bigcup\_{m=1}^{\mathcal{C}\_{\mathcal{G}}(n)} G(m,n). \tag{9}$$

**Corollary 2** *A set of* {*G*(*m*, *n*)} *CG* (*n*) *<sup>m</sup>*=<sup>1</sup> *is composed of a measuring phase space in one dimension.*

*Proof* Using the parameter *m*, {*G*(*m*, *n*)} *CG* (*n*) *<sup>m</sup>*=<sup>1</sup> can be listed in a linear order.

**Lemma 4** *By Burnside's lemma,* φ *being Euler's phi-function,*

$$C\_G(n) = \frac{1}{n} \sum\_{k|n} \phi(k) 2^{\frac{4}{k}}.\tag{10}$$

*Proof* A brief proof of this lemma can be found in [29].

**Definition 11** Let *fG*(*m*, *n*) denote a number of vectors in the *m*-th cluster *G*(*m*, *n*).

**Corollary 3** *For any fG*(*m*, *n*)*,* 1 ≤ *fG*(*m*, *n*) ≤ *n.*

*Proof* Due to Lemma 2, each *fG*(*m*, *n*) ≤ *n* in general; for two special vectors in {0 ... 0, 1 ... 1}, we have *fG*(*m*, *n*) = 1.

**Corollary 4** *Collecting all possible rotation clusters, the total number of vectors is equal to f*<sup>Ω</sup> (*n*)

$$\sum\_{m=1}^{\mathcal{L}\_{\mathcal{Q}}(n)} f\_{\mathcal{G}}(m, n) = 2^n$$

$$= f\_{\mathcal{Q}}(n). \tag{11}$$

*Proof* From Lemma 4 and Corollary 3, it contains a full set of 2*<sup>n</sup>* vectors in Ω(*n*).

**Lemma 5** *For a given n, CG*(*n*) *has an approximate number,*

$$C\_G(n) \approx O(\frac{2^n}{n}).\tag{12}$$

*Proof* Using Corollaries 3 and 4, each distinct cluster contains at most *n* vectors; it is a natural to have such an approximate number in enumeration.

It is convenient to list defined rotation parameters in Table 2 for *n* = 4 condition.

#### *2.4 Counting Properties on Measuring Phase Spaces*

For any vector *x* ∈ Ω(*n*), three measuring parameters {*n*, *p*, *q*} are represented as three invariants. Three measurements transfer a phase space into a set of measuring phase spaces in a hierarchy.


**Table 2** Six rotation clusters, various vectors in {*G*(*m*, 4)}

**Definition 12** (*L*(*p*, *n*) *combinatorial cluster*) Let *L*(*p*, *n*) be a combinatorial cluster of vectors in Ω(*n*), *L*(*p*, *n*) = Ω(*n*|*p*) ⊂ Ω(*n*). Two parameters {*n*, *p*} partition the phase space Ω(*n*) to form a set of clusters {*L*(*p*, *n*)} in a measuring phase space.

$$\mathcal{Q}(n|p) = L(p, n) = \{ \forall \mathbf{x} | 0 \le p \le n, \mathbf{x} \in \mathcal{Q}(n) \}. \tag{13}$$

**Corollary 5** *A set of* {*L*(*p*, *n*)} *n <sup>p</sup>*=<sup>0</sup> *is composed of a measuring phase space in one dimension.*

*Proof* The parameter *p* is the active invariant to arrange the phase space in a linear order.

**Definition 13** Let *CL* (*n*) be a number of clusters in ∀*p*,{*L*(*p*, *n*)}.

**Lemma 6** *For a given n,*

$$C\_L(n) = n + 1.\tag{14}$$

*Proof* Using Definition 12, 0 ≤ *p* ≤ *n* and for any *p*, *L*(*p*, *n*) = ∅, the parameter *p* partitions the whole set Ω(*n*) into *n* + 1 distinct subsets as clusters.

**Definition 14** ( *fL* (*p*, *n*) *combinatorial number*) Let *fL* (*p*, *n*) be a combinatorial number of vectors in a cluster *L*(*p*, *n*).

**Lemma 7** *For a pair of* {*n*, *p*} *parameters,*

$$f\_L(p, n) = \binom{n}{p} \tag{15}$$

*Proof* Using Definition 12, this number is equal to a binomial coefficient selected *p* elements from *n* positions.

It is convenient to list defined measuring parameters in Table 3 for *n* = 4 condition.


**Table 3** Five clusters, various vectors in {*L*(*p*, 4)}

**Definition 15** (*E*(*q*, *n*) *crossing cluster of vectors*) Let *E*(*q*, *n*) be a crossing cluster of vectors in Ω(*n*), *E*(*q*, *n*) = Ω(*n*|*q*) ⊂ Ω(*n*). Two parameters {*n*, *q*} partition the phase space Ω(*n*) to form a set of clusters {*E*(*q*, *n*)} in a measuring phase space.

$$\mathcal{Q}(n|q) = E(q, n) = \{ \forall x | 0 \le q \le \lfloor \frac{n}{2} \rfloor, x \in \mathcal{Q}(n) \}\tag{16}$$

**Corollary 6** *A set of* {*E*(*q*, *n*)} *n*/2 *<sup>q</sup>*=<sup>0</sup> *is composed of a measuring phase space in one dimension.*

*Proof* The parameter *q* is the active invariant to arrange the phase space in a linear order.

**Definition 16** Let *CE* (*n*) be a number of crossing clusters in ∀*q*,{*E*(*q*, *n*)}.

**Lemma 8** *For a given n* > 0*,*

$$C\_E(n) = \lfloor \frac{n}{2} \rfloor + 1. \tag{17}$$

*Proof* According to Definition <sup>15</sup> and each *<sup>E</sup>*(*q*, *<sup>n</sup>*) = ∅, <sup>0</sup> <sup>≤</sup> *<sup>q</sup>* ≤ *<sup>n</sup>* <sup>2</sup> , the parameter *<sup>q</sup>* partitions the whole set Ω(*n*) into *<sup>n</sup>* <sup>2</sup> + 1 distinct subsets as clusters.

**Definition 17** ( *fE* (*q*, *n*) *number of vectors*) Let *fE* (*q*, *n*) be a number of vectors in a cluster *E*(*q*, *n*).

**Lemma 9** *For a pair of* {*n*, *q*} *parameters,*

$$f\_E(q, n) = 2 \* \binom{n}{2q}, 0 \le q \le \lfloor \frac{n}{2} \rfloor. \tag{18}$$

*Proof* Two cases can be distinguished: Case 1: *<sup>q</sup>* <sup>=</sup> 0; Case 2: 1 <sup>≤</sup> *<sup>q</sup>* ≤ *<sup>n</sup>* 2 . Case 1: All *n* values are either 1 or 0, 2 ∗ *n* 0 = 2.

Case 2: For a given *q*, 2*q* crossing positions are composed of a pair of a 0–1 crossing then a 1–0 crossing repeatedly for *q* times in a vector and this configuration has a total of *n* 2*q* vectors included, and the same pair of positions can be exchanged as a


**Table 4** Three clusters, vectors in {*E*(*q*, 4)} cases

pair of 1–0 and 0–1 crossings with the same number of different vectors, so a total of 2 ∗ *n* 2*q* vectors are involved in each *q* selection.

It is convenient to list above defined measuring parameters in Table 4 for *n* = 4 condition.

#### **3 Variant Symmetric Clusters**

**Definition 18** (*V*(*q*, *p*, *n*) *variant cluster*) Let *V*(*q*, *p*, *n*) be a variant cluster of vectors in Ω(*n*), *V*(*q*, *p*, *n*) = Ω(*n*|*p*, *q*) ⊂ Ω(*n*). Three parameters {*n*, *p*, *q*} partition the phase space Ω(*n*) to form a set of clusters {*V*(*q*, *p*, *n*)} in a measuring phase space.

$$\mathcal{Q}(n|p,q) = V(q,p,n) = \{ \forall x | 0 \le p \le n, 0 \le q \le \lfloor \frac{n}{2} \rfloor, x \in \mathcal{Q}(n) \}\tag{19}$$

**Corollary 7** *A set of* {*V*(*q*, *p*, *n*)}∀*q*,*<sup>p</sup> is composed of a measuring phase space on two dimensions.*

*Proof* Both invariants *q* and *p* are two active invariants to arrange the phase space on a 2D plane lattice.

**Lemma 10** *Both* {*L*(*p*, *n*)} *combinatorial clusters and* {*E*(*q*, *n*)} *crossing clusters can be generated from special subsets of* {*V*(*q*, *p*, *n*)} *variant clusters.*

*Proof* For a given *p*, *L*(*p*, *n*) can be determined by

$$L(p,n) = \bigcup\_{q=0}^{\lfloor \frac{n}{2} \rfloor} V(q,p,n).$$

For a given *q*, *E*(*q*, *n*) can be determined by

$$E(q,n) = \bigcup\_{p=0}^{n} V(q,p,n).$$


**Table 5** Three sets of variant clusters for *n* = 4 in {*V*(*q*, *p*, *n*)} condition

Applying this set of partitions, three sets of relevant clusters can be identified.

For example, *n* = 4, all 16 vectors in the vector space, three sets of clusters can be distinguished as six clusters {*V*(*q*, *p*, *n*)}, five clusters for {*L*(*p*, *n*)} and three clusters for {*E*(*q*, *n*)} shown in Table 5, respectively.

**Definition 19** Let *CV* (*n*) be a number of non-trivial variant clusters in ∀*q*, *p*, {*V*(*q*, *p*, *n*)}.

In general condition for any given *n* > 1, three sets of variant clusters could be shown in Fig. 1.

**Theorem 1** *For a given n, CV* (*n*) *satisfies Eq. 20*

$$C\_V(n) = \begin{cases} n^2/4 + 2; & n \equiv 0 \bmod 2\\ (n^2 - 1)/4 + 2; & n \equiv 1 \bmod 2. \end{cases} \tag{20}$$

*Proof* From Fig. 1 for a given *n*, a triangular shape for non-trivial variant clusters is composed of two parts: a triangular area and two *q* = 0 points. The triangular

**Fig. 1** Three sets of variant clusters {*V* (*q*, *p*, *n*)},{*E*(*q*, *n*)},{*L*(*p*, *n*)} for *n* > 1

area has (*n* − 1) length and *n*/2 high. If *n* ≡ 0 mod 2, the triangular area is a regular triangle contained *n*2/4 clusters, so the total number of this triangular shape contains *n*2/4 + 2 clusters. For an odd valued *n*, a triangular area has additional *n*/2 clusters side on a regular triangle with *n*/2<sup>2</sup> clusters, so the total number of clusters is *n*/2<sup>2</sup> + *n*/2 + 2 = (*n*<sup>2</sup> − 1)/4 + 2.

#### *3.1 Variant Trinomial Coefficients – Elementary Equation*

**Definition 20** Let *fV* (*q*, *<sup>p</sup>*, *<sup>n</sup>*) or *<sup>f</sup>* (*q*, *<sup>p</sup>*, *<sup>n</sup>*) <sup>0</sup> <sup>≤</sup> *<sup>p</sup>* <sup>≤</sup> *<sup>n</sup>*, <sup>0</sup> <sup>≤</sup> *<sup>q</sup>* ≤ *<sup>n</sup>* <sup>2</sup> denote an enumeration function for a number of 0–1 vectors in a variant cluster.

It is convenient to list relevant measuring parameters in Table 6 for *n* = 4 conditions.

**Definition 21** For two initial and end clusters *p* = {0, *n*}, *q* = 0, let two cases be *<sup>f</sup>* (0, <sup>0</sup>, *<sup>n</sup>*) <sup>=</sup> *<sup>f</sup>* (0, *<sup>n</sup>*, *<sup>n</sup>*) <sup>=</sup> 1. For other cases, each cluster 0 <sup>&</sup>lt; *<sup>p</sup>* <sup>&</sup>lt; *<sup>n</sup>*, <sup>0</sup> <sup>&</sup>lt; *<sup>q</sup>* ≤ *<sup>n</sup>* 2 contains a subgroup of vectors under a given condition. A variant trinomial coefficient for a number of vectors in a cluster is defined as an elementary equation in Equation 21,

$$f(q, p, n) = \frac{n}{n - p} \binom{n - p}{q} \binom{p - 1}{q - 1}.\tag{21}$$

Applying variant trinomial coefficients in Eq. 21, there is no difficult to process more complicated cases in enumeration. Global arrangements on their triangular shapes are convenient to be arranged by *p* measures in vertical direction. Two cases *n* = {4, 5} are shown in Table 7.

In a general condition for any given *n* > 1, three sets of various numbers can be shown in Fig. 2.


**Table 6** Six clusters, vectors in {*V* (*q*, *p*, 4)}


**Table 7** Three sets of vector numbers { *f* (*q*, *p*, *n*)},{ *fE* (*q*, *n*)},{ *fL* (*p*, *n*)};(a) *n* = 4;(b) *n* = 5

**Fig. 2** Three sets of { *f* (*q*, *p*, *n*)},{ *fE* (*q*, *n*)},{ *f* (*p*, *n*)} variant numbers for *n* > 1

#### *3.2 Combinatorial Projection on Variant Clusters*

From an algebraic viewpoint, the following theorems and corollaries are established for a general condition to meet any *n* ≥ 1 cases.

**Lemma 11** *If fL* (*p*, *<sup>n</sup>*) <sup>=</sup> *<sup>p</sup> <sup>q</sup>*=<sup>1</sup> *f* (*q*, *p*, *n*), 0 < *p* < *n, then the projection function fL* (*p*, *n*) *is a binomial coefficient and*

*fL* (*p*, *n*) = *n p* . (22)

*Proof* For a fixed *p*, 0 < *p* < *n*, all possible { *f* (*q*, *p*, *n*)} are collected to form the following combinatorial identities: [14, 15, 21],

$$\begin{split} f\_L(p,n) &= \sum\_{q=1}^p f(q,p,n) \\ &= \sum\_{q=1}^p \frac{n}{n-p} \binom{n-p}{q} \binom{p-1}{q-1} \\ &= \frac{n}{n-p} \sum\_{q=1}^p \binom{n-p}{q} \binom{p-1}{q-1} \\ &= \frac{n}{n-p} \sum\_{q=1}^p \binom{n-p}{q} \binom{p-1}{p-q}; \quad \binom{N}{k} = \binom{N}{N-k} \\ &= \frac{n}{n-p} \binom{n-1}{p}; \quad \binom{x+y}{N} = \sum\_{k=0}^N \binom{x}{k} \binom{y}{N-k} \\ &= \frac{n}{(n-p)} \frac{(n-1)!}{(n-p-1)!p!} \\ &= \frac{n!}{(n-p)!p!} \\ &= \binom{n}{p} .\end{split}$$

For a complete sequence of binomial coefficients, it is necessary to include both initial and end clusters. Further Theorem 2 can be established.

**Theorem 2** *For any given n* > 0*, a set of projection function* { *fL* (*p*, *n*)} *n <sup>p</sup>*=<sup>0</sup> *is composed of the same sequence of binomial coefficients*

*fL* (*p*, *n*) = *n p* . (23)

*Proof* For 0 < *p* < *n* condition, the equation has been determined by Lemma 11 and two end clusters *p* = {0, *n*}, *n* 0 = *n n* = 1 are determined by Definition 21.

**Corollary 8** *The sum of all possible* { *fL* (*p*, *n*)} *n <sup>p</sup>*=<sup>0</sup> *is equal to f*<sup>Ω</sup> (*n*)*,*

$$\sum\_{p=0}^{n} f\_L(p, n) = f\_\Omega(n) = 2^n. \tag{24}$$

*Proof* Collecting all possible numbers in Theorem 2, we have

$$\begin{aligned} \sum\_{p=0}^{n} f\_L(p, n) &= \sum\_{p=0}^{n} \binom{n}{p} \\ &= (1 + 1)^n \\ &= 2^n \\ &= f\_{\mathcal{Q}}(n). \end{aligned}$$

#### *3.3 Crossing Projection on Variant Clusters*

**Lemma 12** *If fE* (*q*, *<sup>N</sup>*) <sup>=</sup> *<sup>n</sup>*−*<sup>q</sup> <sup>p</sup>*=*<sup>q</sup> <sup>f</sup>* (*q*, *<sup>p</sup>*, *<sup>n</sup>*), <sup>1</sup> <sup>≤</sup> *<sup>q</sup>* ≤ *<sup>n</sup>* <sup>2</sup> *, then the enumeration function fE* (*q*, *n*) *is a double of a binomial coefficient*

$$f\_E(q, n) = 2\binom{n}{2q}.\tag{25}$$

*Proof* For a fixed *q*, collecting all possible { *f* (*q*, *p*, *n*)} *<sup>n</sup>*−*<sup>q</sup> <sup>p</sup>*=*<sup>q</sup>* , the following combinatorial identities [14, 15, 21] are deduced:

$$\begin{aligned} f\_E(q, n) &= \sum\_{p=q}^{n-q} f(q, p, n) \\ &= \sum\_{p=q}^{n-p} \frac{n}{n-p} \binom{n-p}{q} \binom{p-1}{q-1} \\ &= \sum\_{p=q}^{n-p} \frac{n}{q} \binom{n-p-1}{q-1} \binom{p-1}{q-1}; \quad \frac{N}{q} \binom{N-p-1}{q-1} = \frac{N}{N-p} \binom{N-p}{q} \\ &= \frac{n}{q} \sum\_{p=q}^{n-p} \binom{n-p-1}{q-1} \binom{p-1}{q-1} \end{aligned}$$

$$\begin{aligned} &=\frac{n}{q}\binom{n-1}{2q-1};\quad \binom{N+1}{r+s+1}=\sum\_{k=r}^{N-s} \binom{k}{r}\binom{N-k}{s} \\ &=2\frac{n}{2q}\frac{(n-1)!}{(n-2q)!(2q-1)!} \\ &=2\frac{n!}{(2q)!(n-2q)!} \\ &=2\binom{n}{2q}.\end{aligned}$$

**Theorem 3** *For any given n* > 0 *under the listed condition, a set of projection function* { *fE* (*q*, *n*)}0≤*q*≤ *<sup>n</sup>* <sup>2</sup> *are composed of the subsequence of binomial coefficients,*

$$f\_E(q, n) = 2\binom{n}{2q}.\tag{26}$$

*Proof* For 1 ≤ *q* ≤ *n*/2 condition, equations are determined by Lemma 12 and for the initial subgroup, we have *q* = 0, *fE* (0, *n*) = *n* 0 + *n n* = 2 *n* 0 .

**Corollary 9** *For n* ≡ 0 mod 2, 0 ≤ *q* ≤ *n*/2*, there are a pair of symmetric functions*

$$f\_E(q, n) = f\_E(n/2 - q, n). \tag{27}$$

*Proof* Under *n* ≡ 0 mod 2 condition,

$$\begin{aligned} f\_E(q, n) &= 2 \binom{n}{2q} \\ &= 2 \binom{n}{n - 2q} = 2 \binom{n}{2(n/2 - q)} \\ &= f\_E(n/2 - q, n). \end{aligned}$$

**Corollary 10** *For n* ≡ 0 mod 4, *q* = *n*/4*, fE* (*n*/4, *n*) *has the maximal value*

$$f\_E(n/4, n) \succ f\_E(q, n), q \neq n/4. \tag{28}$$

*Proof* Under *n* ≡ 0 mod 4 condition,

$$f\_E(q, n) = 2\binom{n}{2q} < 2\binom{n}{n/2} = 2\binom{n}{2n/4} = f\_E(n/4, n).$$

**Corollary 11** *The sum of all possible* { *fE* (*q*, *n*)}<sup>0</sup>≤*q*≤ *<sup>n</sup>* <sup>2</sup> *is equal to f*<sup>Ω</sup> (*n*)*,*

$$\sum\_{q=0}^{\lfloor \frac{n}{2} \rfloor} f\_E(q, n) = f\_\Omega(n) = 2^n. \tag{29}$$

*Proof* Collecting all possible numbers, we have the following equations:

$$\begin{aligned} \sum\_{q=0}^{\lfloor \frac{n}{2} \rfloor} f\_E(q, n) &= \sum\_{q=0}^{\lfloor \frac{n}{2} \rfloor} 2 \binom{n}{2q} \\ &= 2 \sum\_{q=0}^{\lfloor \frac{n}{2} \rfloor} \binom{n}{2q}, \quad \sum\_{k \ge 0} \binom{n}{2k} = \sum\_{k \ge 0} \binom{n}{2k+1} = 2^{n-1} \\ &= 2 \times 2^{n-1} \\ &= 2^n \\ &= f\_{\mathcal{D}}(n). \end{aligned}$$

#### *3.4 Relationships of Four Symmetric Clusters*

**Theorem 4** *For any n* > 0*, the sum of all possible functions on* { *f* (*q*, *p*, *n*}∀*p*,∀*<sup>q</sup> or* { *fE* (*q*, *n*)}0≤*q*≤ *<sup>n</sup>* <sup>2</sup> *or* { *fL* (*p*, *n*)} *n <sup>p</sup>*=<sup>0</sup> *or* { *fG*(*m*, *n*)}, 1 ≤ *m* ≤ *CG*(*n*) *is equal to f*<sup>Ω</sup> (*n*)

$$\begin{split} f\_{\Omega}(n) &= \sum\_{\forall p} \sum\_{\forall q} f(q, p, n) = \sum\_{q=0}^{\lfloor \frac{n}{2} \rfloor} f\_E(q, n) = \sum\_{p=0}^n f\_L(p, n) \\ &= \sum\_{m=1}^{C\_G(n)} f\_G(m, n) \\ &= 2^n. \end{split}$$

*Proof* From the results of Corollaries 4, 8 and 11, four schemes provide various partitions to the same set of vectors on Ω(*n*) completely.

**Corollary 12** *Numbers of four symmetric clusters can be expressed by*

**Table 8** Numbers of four symmetric clusters in 1 ≤ *n* ≤ 16


$$\begin{aligned} C\_E(n) &= \lfloor \frac{n}{2} \rfloor + 1; \\ C\_L(n) &= n + 1; \\ C\_V(n) &= \begin{cases} n^2/4 + 2, & n \equiv 0 \bmod 2 \\ (n^2 - 1)/4 + 2, & n \equiv 1 \bmod 2 \end{cases}; \\ C\_G(n) &= \frac{1}{n} \sum\_{k|n} \phi(k) 2^{\frac{k}{2}}. \end{aligned}$$

*Proof* Due to Lemmas 4, 6, 8 and Theorem 1, four equations for numbers of various symmetric clusters are listed.

In convenient for comparison, their values on 1 ≤ *n* ≤ 16 are listed in Table 8, respectively.

Checking real clusters in four schemes, the following corollaries can be provided.

**Corollary 13** *When n* = {1, 2, 3}*, three cluster schemes CL* (*n*),*CV* (*n*),*CG*(*n*) *provide the same partitions of clusters.*

*Proof* Checking the three schemes, we have*CL* (1) = *CV* (1) = *CG*(1) = 2,*CL* (2) = *CV* (2) = *CG*(2) = 3, *CL* (3) = *CV* (3) = *CG*(3) = 4. Relevant cluster contains the same set of vectors.

**Corollary 14** *When n* = {1, 2, 3, 4, 5}*, two cluster schemes CV* (*n*),*CG*(*n*) *provide the same partitions of clusters.*

*Proof* Due to Corollary 13, we need to check *n* = {4, 5} cases. For the two schemes, we have (*CL* (4) = 5) = (*CV* (4) = *CG*(4) = 6),(*CL* (5) = 6) = (*CV* (5) = *CG*(5) = 8). Relevant cluster contains the same set of vectors.

**Corollary 15** *When n* ≥ 6*, four cluster schemes CE* (*n*),*CL* (*n*),*CV* (*n*),*CG*(*n*) *provide different partitions on their clusters.*

*Proof* Due to Corollaries 13 and 14, we need to check *n* = {6, ···} cases. For the four schemes, *CE* (6) = 4,*CL* (6) = 7,*CV* (6) = 11,*CG*(6) = 14. Only a few clusters can contain the same set of vectors.

**Corollary 16** *When n* ≥ 6*, three cluster schemes: combinatorial, crossing and variant* {*CE* (*n*),*CL* (*n*),*CV* (*n*)} *may contain more symmetric properties than rotation clusters on CG*(*n*)*.*

*Proof* Considering a special case on {*n* = 6, *p* = 3, *q* = 2}, *V*(2, 3, 6) = {001101, 011010, 110100, 101001, 010011, 100110, 011001, 110010, 100101, 001011, 010110, 101100}; this cluster contains two cycles:{001101, 011010, 110100, 101001, 010011, 100110} and {011001, 110010, 100101, 001011, 010110, 101100} with six vectors, respectively. Both cycles have rotation symmetries only without reflection symmetries. It is possible to use reflection symmetric operators to distinct two relative cycles to form a pure rotation symmetric structure. However, other clusters may contain more cycles such as *L*(3, 6) with four cycles and *E*(2, 6) with six cycles, respectively. It is necessary to apply other symmetric operators different from rotation for further separations.

#### **4 Four Number Sets of Symmetric Clusters**

#### *4.1 Four Approximates on Numbers of Clusters*

Using the four numeric equations, relevant approximates can be expressed as follows.

**Lemma 13** *Four approximates can be expressed as*

$$C\_E(n) \approx O\left(\frac{n}{2}\right);\tag{31}$$

$$C\_L(n) \approx O\left(n\right);\tag{32}$$

$$C\_V(n) \approx O\left(\frac{n^2}{4}\right);\tag{33}$$

$$C\_G(n) \approx O\left(\frac{2^n}{n}\right). \tag{34}$$

*Proof* Using the four equations, the following approximates can be expressed:

$$\begin{aligned} C\_E(n) &= \lfloor \frac{n}{2} \rfloor + 1 \approx O\left(\frac{n}{2}\right); \\ C\_L(n) &= n + 1 \approx O\left(n\right); \\ C\_V(n) &= \begin{cases} n^2/4 + 2, & n \equiv 0 \bmod 2 \\ (n^2 - 1)/4 + 2, & n \equiv 1 \bmod 2 \end{cases} \approx O\left(\frac{n^2}{4}\right); \\ C\_G(n) &= \frac{1}{n} \sum\_{k|n} \phi(k) 2^{\frac{n}{k}} \approx O\left(\frac{2^n}{n}\right). \end{aligned}$$

#### *4.2 Four Approximates on Numbers of Vectors*

**Definition 22** Let *fX* (*n*), *X* ∈ {*L*, *E*, *V*, *G*} denote an approximate number of vectors in *X* cluster.

**Lemma 14** *Four approximates can be expressed as*

$$f\_E(n) \approx O\left(\frac{2^{n+1}}{n}\right);\tag{35}$$

$$f\_L(n) \approx O\left(\frac{2^n}{n}\right);\tag{36}$$

$$f\_V(n) \approx O\left(\frac{2^{n+2}}{n^2}\right);\tag{37}$$

$$f\_G(n) \approx O\left(n\right). \tag{38}$$

*Proof* Since all clusters partition the same phase space Ω(*n*) with 2*<sup>n</sup>* vectors, their approximates for vectors in a cluster can be evaluated,

$$\begin{aligned} f\_E(n) &= \frac{2^n}{O\left(\frac{n}{2}\right)} \approx O\left(\frac{2^{n+1}}{n}\right); \\ f\_L(n) &= \frac{2^n}{O\left(n\right)} \approx O\left(\frac{2^n}{n}\right); \\ f\_V(n) &= \frac{2^n}{O\left(\frac{n^2}{4}\right)} \approx O\left(\frac{2^{n+2}}{n^2}\right); \\ f\_G(n) &= \frac{2^n}{O\left(\frac{2^n}{n}\right)} \approx O\left(n\right). \end{aligned}$$

It is convenient to list approximate numbers on clusters, vectors and dimension of measuring phase spaces in Table 9.


**Table 9** Four approximate numbers on both clusters and vectors

#### **5 Symmetric Boolean Functions for Selected Clusters**

#### *5.1 Four Numbers on Symmetric Boolean Functions*

**Definition 23** Let *SFX* (*n*) denote a number of Symmetric Boolean Functions (SBF) in {*X*(.)}, *X* ∈ {*E*, *L*, *V*, *G*}.

**Theorem 5** (Four types of symmetric Boolean functions) *Total numbers of four types of symmetric Boolean functions SFX* (*n*), *X* ∈ {*E*, *L*, *V*, *G*} *are*

$$SF\_E(n) = 2^{C\_E(n)} = 2^{\lfloor \frac{\pi}{2} \rfloor + 1};\tag{39}$$

$$SF\_L(n) = 2^{C\_L(n)} = 2^{n+1};\tag{40}$$

$$SF\_V(n) = 2^{C\_V(n)} = \begin{cases} 2^{n^2/4 + 2}, & n \equiv 0 \bmod 2 \\ 2^{(n^2 - 1)/4 + 2}, & n \equiv 1 \bmod 2 \end{cases};\tag{41}$$

$$SF\_G(n) = 2^{C\_G(n)} = O\left(2^{\frac{2^n}{n}}\right). \tag{42}$$

*Proof* For any selected cluster, there are two selections for its symmetric Boolean functions.

#### *5.2 Four Numbers of Balanced Symmetric Clusters*

**Definition 24** Let *SFXb*(*n*) be a maximal number of balanced *SBFX* in {*X*(.)}, *X* ∈ {*L*, *V*, *G*}, *n* = 0 mod 2.

**Definition 25** Let *SFEb*(*n*) be a maximal number of balanced *SBFE* in ∃*q*, {*E*(*q*, *n*)}, *n* = 0 mod 4.

**Lemma 15** *Four selected numbers* {*CXb*(*n*)}, *X* ∈ {*E*, *L*, *V*, *G*} *for balanced symmetric clusters are*

$$C\_{Eb}(n) = \begin{cases} 1, & n \equiv 0 \mod 4 \\ 0, & n \not\equiv 0 \mod 4 \end{cases};\tag{43}$$

$$C\_{Lb}(n) = 1;\tag{44}$$

$$C\_{Vb}(n) = \frac{n}{2};\tag{45}$$

$$C\_{Gb}(n) = O\left(\frac{1}{n} \binom{n}{n/2}\right). \tag{46}$$

*Proof* From Corollary 10 for *Eb* groups *n* ≡ 0 mod 4 cases, *q* = *n*/4 provides a cluster with a maximal number of vectors in a balanced condition and other cases cannot satisfy balanced conditions; for *Lb* groups *n* ≡ 0 mod 2 cases, *p* = *n*/2


**Table 10** Numbers of four balanced symmetric functions in 2 ≤ *n* ≤ 20

provides a cluster with a maximal number of vectors in a balanced condition; for *V b* groups *n* ≡ 0 mod 2 cases, *p* = *n*/2, 1 ≤ *q* ≤ *n*/2, there are *n*/2 clusters involved in a balanced condition; for *Gb* groups *n* ≡ 0 mod 2 cases, *p* = *n*/2, a total of rotation symmetric clusters *O* 1 *n n n*/2 could be involved in a balanced condition.

## *5.3 Four Numbers of Balanced Symmetric Boolean Functions*

**Theorem 6** (Four balanced SYMMETRIC Boolean functions) *Total numbers of four balanced symmetric Boolean functions* {*SFX b*(*n*)}, *X* ∈ {*E*, *L*, *V*, *G*} *are*

$$SF\_{Eb}(n) = 2^{C\_{Eb}(n)} = \begin{cases} 2, & n \equiv 0 \mod 4 \\ 1, & n \not\equiv 0 \mod 4 \end{cases};\tag{47}$$

$$SF\_Lb(n) = 2^{C\_{Lb}(n)} = 2;\tag{48}$$

$$SF\_Vb(n) = 2^{C\_{Vb}(n)} = 2^{\frac{\pi}{2}};\tag{49}$$

$$SF\_Gb(n) = 2^{C\_{\mathbb{S}^b}(n)} = O\left(2^{\frac{1}{\pi}\binom{n}{n/2}}\right). \tag{50}$$

*Proof* Each number of clusters in a selected scheme has been determined in Lemma 15. For any selected cluster in the scheme, there are two selections to form relevant symmetric Boolean functions.

In convenient for comparison, four types of *SBFXb* numbers on 2 ≤ *n* ≤ 20 are listed in Table 10, respectively.

## **6 Cryptographic Properties of Symmetric Boolean Functions in Hierarchy**

Boolean functions are of great importance in the design of random number generators for stream ciphers [25] that are widely used in modern network environment.

Due to cryptographically secure consideration, the sequence produced by the random number generator must satisfy the various properties [6, 8]: the longer period, the period complexity and good statistical distributions. There exists a huge theoretical knowledge of such combining generators [25].

A symmetric Boolean function must fulfil different necessary criteria to yield a cryptographically secure scheme, at least to resist known attacks [11]. In this direction, various measuring parameters play an important role such as balanced, support set, hamming weight, hamming distance, balanced function, non-linearity, correlation immunity, etc. [6, 8].

In relation to balanced properties, when *n* is even, the functions of highest nonlinearity are the bent functions, and it is well known that the bent functions cannot be the balanced functions [28, 33]. From a structural viewpoint, the balanced functions having the highest possible non-linearity need to be considered. However, finding such functions is a very difficult problem [29, 31, 33]. When *n* is odd, exhibiting functions of the highest non-linearity is a hard problem in itself. Among the available candidates, balanced ones exist [16, 33].

To explore optimal functions in rotation symmetric Boolean function sets, many researchers are faced extremely difficulties on computational complexity even for *n* > 10 symmetric Boolean functions [29]. Exponentially increasing complexity makes a complex exhaustive search be quickly impossible. Compared with both variant and rotation schemes listed in Table 10, it is interesting to notice that the variant scheme takes a numeric complexity on *n* = 20 as same as the rotation symmetric scheme on *n* = 10. Much faster computation on optimal functions could be feasibly explored.

From a meta analytic viewpoint, measuring phase spaces provide multiple levels of construction in a hierarchy linked to various symmetric Boolean functions. They support an *n* tuple 0–1 vector construction as a word-based 0–1 vector to satisfy various design and analysis purposes. The variant PRNG construction [38, 43] is a similar approach to RC4 and HC128 stream ciphers [25] in their meta phase spaces using the word-oriented vector structure with the higher speed and efficiency. Measuring phase spaces could support advanced cryptographic applications on the direction.

Due to significant differences between measuring phase spaces proposed and algebraic normal forms classically formulated, in addition to initial balanced symmetric properties discussed in the chapter, other advanced comparison mechanisms need to be established for all interesting cryptographic properties to satisfy practical and optimal requirements for stream ciphers. Further detailed researches and explorations are required.

#### **7 Conclusion**

Symmetric clusters in a hierarchy provide the additional information to organize various symmetric Boolean functions into hierarchical constructions as multiple meta levels of structures efficiently. The variant symmetric functions proposed in this chapter provide a meta construction on a 2D measuring phase space to contribute richer capacities compared with the three classical schemes (combinatorial, crossing and rotation) on 1D measuring phase spaces.

From a measuring viewpoint, three schemes (combinatorial, variant and rotation) in Tables 8, 9 and 10 have similar values in *n* = {1, 2, 3} and {4, 5} or different values in *n* ≥ 6 conditions. The variant scheme provides a 2D intermediate structure different from other two schemes in 1D structure. From an approximate viewpoint, both combinatorial and rotation schemes are shown in stronger similar properties. Their approximate number of clusters and number of vectors in a cluster can be exchanged in Table 9. From an abstract system viewpoint, this pair of exchangeable measurements may provide approximate symmetric properties for both combinatorial and rotation schemes.

From a clustering viewpoint, the most important results are summarized in Theorem 4 to show that the four symmetric cluster schemes are different partition schemes on the same 0–1 vector set.

From a balanced analysis viewpoint, the key results of balanced symmetric Boolean functions are summarized in Theorem 6 and Table 10. This set of results provides a basic measurement to illustrate relevant computational difficulties to explore further optimal properties in balanced symmetric conditions. Different from other three schemes (combinatorial, crossing and rotation) in either very simpler or extremely complex associated with *n* increasing, balanced variant symmetric Boolean functions present very interesting patterns to support even *n* ≥ 20 cases for future explorations.

Many advanced properties are existed to use a meta hierarchical construction to manage relevant measuring phase spaces into multilevels of a hierarchical structure. Various measuring parameters can be used as control parameters in detailed cases. Refined design and analysis can be performed under this meta hierarchy to provide powerful models and tools on design and optimization for future generations of stream ciphers.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Part III Theoretical Foundation—Variant Map

Arc, amplitude, and curvature sustain a similar relation to each other as time, motion, and velocity, or as volume, mass, and density.

—Carl Friedrich Gauss

As long as algebra and geometry have been separated, their progress have been slow and their uses limited; but when these two sciences have been united, they have lent each mutual forces, and have marched together towards perfection.

—Joseph-Louis Lagrange

The arithmetical symbols are written diagrams and the geometrical figures are graphic formulas.

—David Hilbert

In relation to variant map, a longer book chapter (Chapter "Interactive Maps on Variant Phase Spaces") was published in the OA book of Emerging Application of Cellular Automata: 113–196 (2013) by InTech Press. This provides systematical approaches under statistical mechanics in comparison. Possible projections and their mapping mechanisms are explored.

Part III is composed of three chapters (6–8).

Chapter "Variant Maps of Elementary Equations" provides variant maps of elementary equation to generate visual distributions using two cases of combinatorial expressions. From two cases, it is interesting to see symmetric distributions under various parameters and complex distributions are created by control parameters shown in 2D and 3D distributions and their projections.

Chapter "Variant Map System of Random Sequences" describes variant map system of random sequences; five types of maps are defined and proposed on two types of 1D maps and three types of 2D maps. A sample sequences from the AES cipher is selected and multiple maps are illustrated.

Chapter "Stationary Randomness of Three Types of Six Random Sequences on Variant Maps" proposes a testing system for stationary randomness of random sequences on variant maps. Three types of six random sequences are selected. Six samples are composed of three random resources: two block ciphers, two stream ciphers, and two quantum ciphers. Three variation categories are observed.

# **Variant Maps of Elementary Equations**

#### **Jeffrey Zheng**

**Abstract** Using four measures in Type B, there are 11 invariant expressions to form elementary equations of variant measurement. In this chapter, two invariant expressions are selected to illustrate sample procedures from elementary equations to relevant variant maps. Using various projections and multiple levels of representations, complicated binomial coefficients and their variations are illustrated under various conditions. Using multinomial coefficients, multiple viewpoints are used for references. Due to this type of variation framework contains rich structures, further explorations are required from multiple levels on both theoretical foundation and practical applications.

**Keywords** Variant measurement · Elementary equation · Variant map Multinomial coefficient · Coefficient array

## **1 Introduction**

Variant construction starts from *n* 0–1 variables to form 2*<sup>n</sup>* states and 2<sup>2</sup>*<sup>n</sup>* functions, via vector permutation and complement operations on state space to establish a variant logic framework to contain 2*<sup>n</sup>*! × <sup>2</sup><sup>2</sup>*<sup>n</sup>* configurations as a variation space. Variant measurement acts as a core of quantitative measurement, starting from *m* 0–1 variables to explore relevant clustering conditions on 2*<sup>m</sup>* states. Since this type of variations has a close relationship to partition and recombination using binomial and multinomial coefficients under identically combinatorial expressions. Apply-

J. Zheng (B)

This work was supported by the Key Project on Electric Information and Next Generation IT Technology of Yunnan (2018ZI002), NSF of China (61362014), Yunnan Advanced Overseas Scholar Project.

Key Laboratory of Quantum Information of Yunnan, Yunnan University, Kunming, China e-mail: conjugatelogic@yahoo.com

J. Zheng Key Laboratory of Software Engineering of Yunnan, Yunnan University, Kunming, China

J. Zheng (ed.), *Variant Construction from Theoretical Foundation to Applications*, https://doi.org/10.1007/978-981-13-2282-2\_6

ing the results in Chapter "Elementary Equations of Variant Measurement", Type B measures are composed of 11 nontrivial invariants. Two invariants are selected in this chapter, their different partition properties are illustrated to use coefficients on 2D and 3D distributions. Variant maps are generated from coefficient arrays as samples.

#### **2 Measures and Maps**

Two combinatorial invariants are selected: {*m* − *p*}{*p*} and {2*q*}{*m* − 2*q*}. Different distributions on their coefficients are explored.

# *2.1 Case 1.* **{***m* **−** *p***}{** *p***}**

For {*m* − *p*}{*p*} formula, relevant equation is

$$
\binom{m}{p} = \sum\_{k=0}^{p} \binom{m-p}{k} \binom{p}{k} \tag{1}
$$

A binomial coefficient is separated by sum of*(p* + 1*)* pairs of binomial coefficient products. For a selected value *p*, coefficients { *m*−*<sup>p</sup> k <sup>p</sup> k* }*,* 0 ≤ *k* ≤ *p* are arranged in a linear order.

This property is true for all *p* values. A special three tuple structure *(m, p, k)* has 1–1 correspondence with a coefficient *<sup>f</sup> (m, <sup>p</sup>, <sup>k</sup>)* <sup>=</sup> *m*−*<sup>p</sup> k <sup>p</sup> k* . While *m* value increased, coefficient array will be increased as a 3D rectangular steps, each *m* value has a *(m* + 1*)*<sup>2</sup> region.

The nontrivial coefficients are distributed as a triangle. Let*F(m, p)* = ∀*k f (m, p, k)*, 0 ≤ *p* ≤ *m* and *G(m, k)* = <sup>∀</sup>*<sup>p</sup> f (m, p, k),* 0 ≤ *k* ≤ *m*, two projections {*F(m, p), G(m, k)*} can be projected. Coefficients and relevant four maps are shown in Fig. 1.

**Lemma 1** *For* {*m* − *p*}{*p*} *equation, coefficients are distributed in (m* + 1*)*<sup>2</sup> *and all nontrivial coefficients are clustered in 1/4 region and 3/4 regions has coefficient 0.*

# *2.2 Case 2.* **{2***q***}{***m* **− 2***q***}**

Briefly {*m* − *p*}{*p*} and {2*q*}{*m* − 2*q*} are simple invariants. For {2*q*}{*m* − 2*q*} invariant, it has the following equation.

**Fig. 1** One set of coefficients and its two projections in four maps (**a**)–(**d**); **a** 3D *f (*10*, p, k)*; **b** 2D *f (*10*, p, k)*; **c** 1D *F(*10*, p)*; **d** 1D *G(*10*, k)*

$$
\binom{m}{p} = \sum\_{k=0}^{p} \binom{2q}{k} \binom{m-2q}{p-k} \tag{2}
$$

where *q* is a free variable, 0 ≤ *q* ≤ *m/*2. Different from Case 1, this equation can determine *l f loorm/*2 + 1 levels of coefficients according to different *q* values selected to form a 3D coefficient structure.

Let *<sup>f</sup> (m, <sup>q</sup>, <sup>p</sup>.k)* <sup>=</sup> <sup>2</sup>*<sup>q</sup> k <sup>m</sup>*−2*<sup>q</sup> p*−*k* under 0 ≤ *q* ≤ *m/*2*,* 0 ≤ *k, p* ≤ *m* conditions, nontrivial coefficients are distributed in special shapes on multiple 2D regions.

Using color coding scheme, it is feasible to map coefficients into greyscale or color pixels as variant maps.

A binomial coefficient can be separated as sum of *(p* + 1*)* pairs of coefficient products { 2*q k <sup>m</sup>*−2*<sup>q</sup> p*−*k* }*,* 0 ≤ *k* ≤ *p* to be a linear order.

This type of property is true for all *p* values, a special tuple of four parameters *(m, q, p, k)* has 1–1 correspondence with coefficient <sup>2</sup>*<sup>q</sup> k <sup>m</sup>*−2*<sup>q</sup> p*−*k* . Each selected *m* value is corresponding to *(m* + 1*)*<sup>2</sup> × *(m/*2 + 1*)* region to locate all coefficients.

**Lemma 2** *For* {2*q*}{*m* − 2*q*} *combinatorial invariant, all coefficients are restricted in (m* + 1*)*<sup>2</sup> × *(m/*2 + 1*) region.*

#### **3 Visual Results**

It is convenient to use color coding to transfer each coefficient as a pixel in a variant map. Invariant coefficients provide ideal conditions for a practical measurement, it is feasible to check physical differences between an idea distribution and a practical measurement.

From a quantitative viewpoint, multinomial expressions provide proper basis on corresponding partitions to be a relative measurement in representation.

#### *3.1 Case 1. Maps*

Using *<sup>m</sup> p* → {*m*−*<sup>p</sup> k <sup>p</sup> k* }, three maps are shown in Fig. 1 as 2D coefficients, 3D histograms, and 2D projections on four parameters *m* = {10*,* 11*,* 15*,* 16}, respectively.

#### *3.2 Case 2. Maps*

Different from Case 1, each *m* is associated with one 2D coefficient. In *<sup>m</sup> p* → { 2*q k m*−2*<sup>q</sup> p*−*k* } conditions, each *q* selection determines a 2D array of coefficients. Under 0 ≤ *q* ≤ *m/*2 conditions, *m/*2 + 1 levels are required. For *m* = 10, it is necessary to have 6 levels.

To observe global properties, a 3D color map is shown in Fig. 3 to illustrate 3D coefficients under color coding.

#### **4 Result Analysis**

In maps of Figs. 1, 2, and 3, it is convenient to see variant maps transformed from elementary equations. From a certain viewpoint, {*m* − *p*}{*p*} coefficients have symmetric properties on horizontal direction on *p* : *m* − *p* with reflective properties. Nontrivial coefficients are located in 1/4 region of *(m* + 1*)*<sup>2</sup> square. An isosceles triangle is composed of all nontrivial coefficients. Selecting any *m*, there is only one 2D coefficient associated with to be a unified distribution.

{2*q*}{*m* − 2*q*} coefficients are corresponding to multiple 2D distributions under various *q* values. While *q* = 0, each nontrivial coefficient is located on diagonal position of *<sup>p</sup>* <sup>=</sup> *<sup>k</sup>* and each coefficient is a <sup>2</sup>*<sup>q</sup> k <sup>m</sup>*−2*<sup>q</sup> p*−*k* equation. In 0 ≤ *q* ≤ 5 conditions, 2D coefficient matrices are shown in six groups of {0 : 10*,* 2 : 8*,* 4 : 6*,* 6 :

**Fig. 2** {*m* − *p*}{*p*} maps: *m* = {10*,* 11*,* 15*,* 16}; *(a*1*)*−*(d*1*) m* = 10; *(a*2*)*−*(d*2*) m* = 11; *(a*3*)*−*(d*3*) m* = 15; *(a*4*)*−*(d*4*) m* = 16

4*,* 8 : 2*,* 10 : 0}, this can be described as *(x* + *y)<sup>n</sup>*+*<sup>l</sup>* = *(x* + *y)<sup>n</sup>(x* + *y) <sup>l</sup>* coefficient distributions that can be illustrated in Fig. 2 {{*(a*0*)*−*(c*0*)*}−{*(a*5*)*−*(c*5*)*}} maps.

**Fig. 3** {2*q*}{*m* − 2*q*} maps: *m* = 10; *(a*0*)*−*(c*0*)q* = 0; *(a*1*)*−*(c*1*)q* = 1; *(a*2*)*−*(c*2*)q* = 2; *(a*3*)*−*(c*3*)q* = 3; *(a*4*)*−*(c*4*)q* = 4; *(a*5*)*−*(c*5*)q* = 5

**Fig. 4** {2*q*}{*m* − 2*q*} map:*m* = 10; 3D color map

#### **5 Conclusion**

It is a new exploration to use elementary equation to illustrate relevant variant maps. Based on the described model and calculation, it is convenient to do various analysis and visualization. It is an initial step to check two invariants from Type B for four variant measures. Further explorations are required on five levels of 11 nontrivial invariants in Type B. From results in this chapter, distinct distributions are observed on the two selected invariants. Other nine invariants in Type B will be discussed in future papers (Fig. 4).

**Acknowledgements** The author would like to thank Yifeng Zheng and Kaiyu Yang for generating binomial coefficients in different conditions and Dr. Dennis Heim for correction of the chapter.

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Variant Map System of Random Sequences**

**Jeffrey Zheng**

**Abstract** Sequences of random variables play a key role in probability theory, stochastic processes, and statistics to analyze dynamic behavior. Speckle patterns have emerged as useful tools to explore space–time variations of random sequences in various measurement applications of comprehensive properties in complex space– time variation events. In this chapter, a variant map system is proposed to analyze statistical properties of random sequences in visual representations. An input 0–1 sequence will be divided into multiple segments and each segment of a fixed length will be transformed into a 2-tuple pair of measures. Five measuring sets are identified and rearranged in a 1D or 2D numerical array as a histogram representing a visual map. These five types of maps consist of two types in 1D format as classical maps and three types in 2D format as variant maps. Properties are analyzed on all five types of maps. A cryptographic sequence of the AES cipher is selected as a sample stream. The five types of visual maps are generated and refined clustering characteristics are organized into four groups on changes of segmented and shifted lengths for visual comparisons on enlarged 2DP maps. Speckle patterns of various distributions are observed. Three variant maps with distinct statistic distributions could be useful to provide new visual tools to explore comprehensive cryptographic sequences on complex nonlinear dynamic behavior in global network environments.

**Keywords** Variant map · Visual representation · Multiple segment · Statistical probability distribution · Clustering characteristics

J. Zheng (B)

This work was supported by the Key Project on Electric Information and Next Generation IT Technology of Yunnan (2018ZI002), NSF of China (61362014), Yunnan Advanced Overseas Scholar Project.

Key Laboratory of Quantum Information of Yunnan, Yunnan University, Kunming, China e-mail: conjugatelogic@yahoo.com

J. Zheng Key Laboratory of Software Engineering of Yunnan, Yunnan University, Kunming, China

J. Zheng (ed.), *Variant Construction from Theoretical Foundation to Applications*, https://doi.org/10.1007/978-981-13-2282-2\_7

## **1 Introduction**

Associated with network communication and internet technology [1] in global applications, web communication, internet of things, cloud computing, big data, mobile phone, and smart wireless technologies [2] are significantly developed in the last decade and widely adapted over the world market. In the current situation, it is a key issue for cryptographic researchers and applications [3] to use advanced technologies of stream ciphers to protect data security of ultrafast and extra-big data streams in global network environments.

#### *1.1 Pseudo-Random Sequences*

#### **1.1.1 From Linear Stream Ciphers**

Traditional stream ciphers [4] on LFSR Linear Feedback Shift Register structure (in military cryptography) are used as pseudo-random number generators, due to the ease of implementation from simple hardware, long periods, and uniformly distributed streams. The LFSR stream ciphers are the core in classical stream ciphers through the mathematical theory of algebraic functions for system simulation and analysis.

However, an LFSR is a linear system leading to fairly easy cryptanalysis using the Berlekamp–Massey algorithm. Important LFSR-based stream ciphers use A5/1 & A5/2 in GSM cell phones and E0 in Bluetooth. But the A5/2 cipher has been broken and both A5/1 and E0 have serious weaknesses [5, 6].

#### **1.1.2 From Nonlinear Stream Ciphers**

The new generation of stream ciphers [7, 8] are widely used in advanced web communications. Three general methods are applied to improve security weaknesses in LFSR-based stream ciphers:


With batch, a series of nonlinear algorithms have emerged [12]: nonlinear equivalence [13], evolutionary methods [10], AES cipher [14], RC4 [15], ZUC [9], cellular automata [16], and nonlinear dynamic system [17].

The new generation of stream ciphers are being shifted from the traditional mode: LFSR [4] to various nonlinear modes: NLFSR [18, 19], clock control [11], nonlinear functions [9] etc., it is essential for ciphers to be integrated and implemented [20] to satisfy security models. However, different from LFSR with well-established linear mathematical theories and simulation tools, it is extremely difficult to use advanced nonlinear mathematical theories, recursive models, descriptive tools, and implementing schemes [17] in nonlinear dynamic environments.

How to evaluate cryptographic sequences generated from the nonlinear stream ciphers is an urgent problem for modern stream ciphers.

## *1.2 Truly Random Sequences from Hardware Devices and Speckle Patterns*

In addition to pseudo-random sequences generated by stream ciphers, high-quality stochastic oscillators of truly random sequences are generated from special hardware devices such as laser photonics [21], nonlinear optics [22], quantum optics [23], quantum noises [24], thermal noise [25], chaos, and fractal nonlinear dynamics [26].

A list of truly random number generators are developed to extract stochastic information from speckle patterns [27], i.e., random bits from turbulence [28] to get random numbers from the speckle positions, generation of random arrays using laser speckle [29], 2D generation of random numbers by multimode fiber speckle [30], Markov speckle for efficient random bit generation [31] and dynamic laser speckle and applications [11].

Since various truly random sequences are created from specific physical models with special principles and uncertain methodologies, it is extremely difficult for cryptographic researchers to make proper measurements explore nonlinear dynamic properties.

#### *1.3 Statistic Testing Packages on Cryptographic Sequences*

Randomness has been explored for many years [32] on a series of statistic testing theories and methods. The NIST 800-22 testing package [33] is an effective statistic package on random sequences collecting a set of 16 statistic testing schemes in evaluations of statistic properties on cryptographic sequences. Statistic testing packages are very useful to catch a list of quantitative measurements evaluating randomness properties of cryptographic sequences in wider applications. However, testing schemes in various packages are mainly focused on P-value or a list of static properties of a testing sequence.

Since comprehensive behaviors in nonlinear dynamics may increase computational complexities tragically to involve complicated dynamic properties in the multivariate environment, those dynamic behaviors are completely ignored.

#### *1.4 Gaussian Distribution and Speckle Pattern*

Multivariate normal probability distribution models are the most important and powerful tools that are used to test stochastic characteristics of a random data sequence [34] under the framework of probability, stochastic process, and statistics [35] for nonlinear problems. In this kind of measuring models, when the data sequence is sufficiently long, the high-dimensional probability distribution of the sequence [36] is similar to the continuous Gaussian distribution.

A typical projection model is shown in Fig. 1a; the central part shows a Gaussian surface with an unbalanced distribution in a 2D plane distributed as *P(X, Y )* measures with pseudo-colors and its two 1D projections shown in both horizontal *P(X)* and vertical *P(Y )* planes, respectively. In Fig. 1b, a standard Gaussian surface with symmetric shapes is illustrated and the 2D projection of its pseudo-color map is shown in Fig. 1c with an ideal continuous distribution of color on the map. Different from ideally continuous distributions, in Fig. 1d, a real image generated from the Laser speckle phenomena [37] is illustrated as an objective speckle pattern [38] scattered by a laser beam from a plastic surface onto a wall. It is convenient for us to compare different color maps in Fig. 1c, d, respectively.

From these set of figures, the relationship between the projection curve and two 1D Gaussian distributions can be observed in the multivariate normal probability environment. Multivariate Gaussian probability distributions may support classical schemes to analyze complex stochastic data sets of measuring sequences in many applications in continuous conditions. But speckle patterns in Fig. 1d provide intrinsically discrete random patterns that may not be easily simulated by smoothed Gaussian map in Fig. 1c, further exploration on proper simulation and control mechanisms are required.

#### *1.5 Controlling Deterministic Chaos*

Controlling deterministic chaos has been an active R&D field in nonlinear dynamics over the past decades. From the pioneering work, significant progress has been achieved in control spatiotemporal chaos [39], plasma device, laser systems [40], chemical reactions, and biological systems both spatial and temporal dependence considered. The complex Ginzburg–Landau equation (CGLE) system [41] describes universal dynamics features near a supercritical Hopf bifurcation. It exhibits defected mediate turbulence or spiral turbulence in a wide parameter region. The control by generating a spiral wave seed has been described [42, 43] to grow into a stable spiral in the CGLE system.

Systematic approaches on simulation of nonlinear behaviors, speckle phenomena in optics [37] and pattern dynamics [44] have been actively explored.

**Fig. 1** Multivariate Gaussian Probability Distributions and an objective speckle pattern; **a** Bivariate normal distribution with two probability projections; **b** A symmetric bivariate normal surface with pseudo-colors; **c** A 2D pseudo-color map of the symmetric bivariate normal surface; **d** An objective speckle pattern scattered by a laser beam from a plastic surface onto a wall. [38]

#### *1.6 Poincaré Map*

From a measuring viewpoint, spatial variations of a stochastic sequence will be changed by overall macro characteristics showing statistic measurements of distributed patterns [45] in a vector space, so that a random sequence is measured by an analytic space. From an analysis viewpoint, the Poincaré section [46] corresponds to a discrete map proposed by the eminent French scientist Henri Poincaré 100 years ago.

The Poincaré map handles additional information from sequential changes of ordered measurements in the phase space of classical dynamics, nonlinear dynamic systems [47] and chaos.

The mapping mechanism of the Poincaré map may be useful to handle dynamic patterns on cryptographic sequences of stream ciphers. This mapping scheme has been applied to observe the global randomness of cellular automata sequences on 2D maps [48] 20 years ago.

#### *1.7 Variant Framework*

Various schemes following the top-down strategy are explored to use multiple measures to partition special phase spaces from a top state set to multiple bottom states via multi-levels of a hierarchy in combinatorial algorithms [49], image analysis and processing for many years.

The conjugate classification [50] is proposed to apply seven measures in a hierarchy to partition the kernels of four regular plane lattices on *n* = {4*,* 5*,* 7*,* 9} cases for 2D binary images. For 1D cellular automata sequences, global random behaviors [48] are visualized in 2D maps.

For *n*-tuple bit vectors, the variant logic framework [51] was proposed and various applications were explored: 3D visual method on random number sequences [52], variant Pseudo-Random Number Generator PRNG [53, 54], computational simulation on quantum interactions [55, 56], noncoding DNA analysis [57] and bat echolocation [58].

#### *1.8 Proposed Scheme*

For the purpose of system characterization based on comprehensive measurements of cryptographic sequences, we propose a variant map system for a 0–1 stochastic sequence with length *N*. Multiple segments *M* are divided from the sequence by a given length *m*. A 2-tuple pair of measures can be extracted from a 0–1 segment that is the number of a single element and the number of 01 patterns in the segment. All paired measures are composed of a sequence of M pairs of measures as an ordered measuring set with *M* elements.

The pairs of the measuring sequence are directly separated into two independent measuring sequences to keep each parameter in the same order. Applying the pairing scheme of the Poincaré section, one single measuring sequence can be reorganized by two consequent measures as a 2-tuple pair of measures. Two measuring sequences in the Poincaré section and the original pairs of measuring sequence are arranged as the three sequences of 2-tuple measures. So a total of five sequences of distinct measures are constructed including two sequences on single measures and three sequences on 2-tuple measures.

Following this approach, two sets of single measuring sequences are sorted as two 1D numerical arrays as statistical histograms being classic 1D maps and three sets of 2-tuple measuring sequences are sorted as three 2D integer arrays as statistic histograms being three variant maps. Under the controlling operations on the changes of the segment lengths and shift displacements, multiple results of the five measuring sequences are transformed into 1D statistic histograms and 2D pseudo-color maps to show effective speckle patterns from the selected cryptographic sequence under various conditions of the combination on the two controlling parameters.

#### *1.9 Organization of the Chapter*

This chapter describes the variant map system in diagrams of the system architecture and the core modules with input/output and processing functions in Sect. 2. In Sect. 3, the relationships among measuring sequences and the five statistical distribution maps are analyzed. In Sect. 4, an AES cipher sequence is selected to form a series of statistical maps based on changes of the two control parameters. From the results of the visual maps in Sect. 4, intuitive analysis and brief comparisons are carried out in Sect. 5. Finally, in Sect. 6, the main results are summarized.

#### **2 Framework of Variant Map System**

#### *2.1 Framework*

For the variant map system, the block diagrams of the system framework and the core modules of the system are shown in Fig. 2. The framework of the system architecture in Fig. 2a is composed of three core modules: the Shift Segment Measurement SSM, the Measuring Sequence Combination MSC, and the Projective Color Map PCM. The three modules are shown in Fig. 2b–d in more detail, respectively.

#### *2.2 Shift Segment Measurement SSM*

The SSM module is shown in Fig. 2b.

Let *X* be a 0–1 vector with *N* elements as an input sequence,

$$X = X\{0\}X\{1\} \cdots X\{I\} \cdots X\{N-1\}, 0 \le I < N; \, X\{I\} \in \{0, 1\} \tag{1}$$

**Fig. 2** The framework of the variant map system for cryptographic sequences; **a** The system architecture; **b** The SSM module; **c** The MSC module; **d** The PCM module

The SSM module consists of two processing units: the Vector Shift VS and the Segment Measurement SM, respectively. The two input control parameters: {*r, m*} are defined as shift length *r* and segment length *m*.

Let *Y* be a 0–1 vector with *N* elements, this vector is generated by the shift operation under the loop displacement condition from the input sequence (i.e., a cyclic shift right + or shift left −)

$$Y = X(r), Y[I] = X[I \pm r], I \pm r (mod N), \\ 0 \le I < N; \\ X[I], Y[I] \in \{0, 1\} \\ \mathcal{Q}$$

The shifted vector is inputted into the SM unit for a segmentation process. The input sequence will be divided from a long sequence with *N* elements into *M* = *N/m* segments as a set of sub-vectors with *m* elements and each segment contains *m* bits. The *i*-th sub-vector 0 ≤ *i < M* on the *j*-th position 0 ≤ *j < m* is denoted as *Yi,<sup>j</sup>* .

This sequence of sub-vectors after the segmenting operation forms the following *m* × *M* matrix, *m* positions for the *i*-th complete row vector in the sequence correspond to a pair of 2-tuple measures: *(pi, qi)*, and incomplete parts of the last sub-vector are ignored.

$$Y = \begin{bmatrix} Y\_{0,0} & Y\_{0,1} & \cdots & Y\_{0,j} & \cdots & Y\_{0,m-1} \\ \vdots & \vdots & \ddots & \vdots & \ddots & \vdots \\ Y\_{i,0} & Y\_{i,1} & \cdots & Y\_{i,j} & \cdots & Y\_{i,m-1} \\ \vdots & \vdots & \ddots & \vdots & \ddots & \vdots \\ Y\_{M-1,0} & Y\_{M-1,1} & \cdots & Y\_{M-1,j} & \cdots & Y\_{M-1,m-1} \\ & \cdots & & & & \\ \cdots & & & & & \\ \end{bmatrix} \rightarrow \begin{bmatrix} (p\_0, q\_0) \\ \vdots \\ (p\_i, q\_i) \\ \vdots \\ (p\_{M-1}, q\_{M-1}) \\ \vdots \\ (p\_{M-1}, q\_{M-1}) \\ \end{bmatrix} \tag{3}$$

The pair of 2-tuple measures *(pi, qi)* is determined by the following formula:

$$\begin{aligned} Y\_{i,j} &= Y[J] \in \{0, 1\}; \; J = i \times m + j, \\ &0 \le i < M, 0 \le j < m, 0 \le J < m \times M \le N \end{aligned} \tag{4}$$

$$p\_i = \sum\_{j=0}^{m-1} Y\_{i,j}, Y\_{i,j} \in \{0, 1\}, 0 \le p\_i \le m;\tag{5}$$

$$q\_i = \sum\_{j=0}^{m-1} \mathbf{l}(Y\_{i,j-1}, Y\_{i,j}) == (0, 1), j - 1 \pmod{m}, 0 \le q\_i \le \lfloor m/2 \rfloor; \quad (6)$$

i.e., *X* = 0011010010*, N* = 10*, M* = 2*, m* = 5;*(p*<sup>0</sup> = 2*, q*<sup>0</sup> = 1*)*;*(p*<sup>1</sup> = 2*, q*<sup>1</sup> = 2*).*

The parameter *pi* is the number of single elements in the *i*-th sub-vector, the parameter *qi* is the number of 01 pattern overlapped in the *i*-th sub-vector in a cyclic condition. For any segment *m >* 0*,* 0 ≤ *pi* ≤ *m,* 0 ≤ *qi* ≤ *m/*2, all segments are transformed from a random sequence with *N* elements into a measuring sequence with *M* elements.

The SSM module outputs the ordered pairs of 2-tuple measures {*pi, qi*} *M*−1 *<sup>i</sup>*=<sup>0</sup> .

#### *2.3 Measuring Sequence Combination MSC*

The MSC module is described in Fig. 2c, the module is composed of two units: the Measuring Split MS and the Measuring Combination MC. The MS unit processes the SSM module's output, and splits the measuring sequence with 2-tuple measures into two independent measuring sequences: {*pi*} *M*−1 *<sup>i</sup>*=<sup>0</sup> *,*{*qi*} *M*−1 *<sup>i</sup>*=<sup>0</sup> to keep the original measuring number invariant.

Recombining each single measuring sequence by overlapping consequent elements as a pair, the MC unit will form two independent measuring sequences organized in 2-tuple measures: {*pi*} *M*−1 *<sup>i</sup>*=<sup>0</sup> → {*(pi*−1*, pi)*} *M*−1 *<sup>i</sup>*=<sup>0</sup> and {*qi*} *M*−1 *<sup>i</sup>*=<sup>0</sup> → {*(qi*−1*, qi)*} *M*−1 *<sup>i</sup>*=<sup>0</sup> , *i* − 1*(mod M)*to provide appropriate sequences for subsequent processing modules.

The MSC module produces the following four measure sequences: {*pi*} *M*−1 *<sup>i</sup>*=<sup>0</sup> *,*{*qi*} *M*−1 *<sup>i</sup>*=<sup>0</sup> *,*{*(pi*−<sup>1</sup>*, pi)*} *M*−1 *<sup>i</sup>*=<sup>0</sup> *,*{*(qi*−<sup>1</sup>*, qi)*} *M*−1 *<sup>i</sup>*=0*)* , respectively.

#### *2.4 Projective Color Map PCM*

The PCM module consists of two units: PA,CM. For five measuring sequences, 1D and 2D measures will be processed separately.

The PA unit processes relevant measuring sequences to transform them into integer arrays and the CM unit will visualize these on either normalized histograms (1D measures) or color maps (2D measures), respectively.

#### **2.4.1 1D Measures**

The 1D measures involve two measuring sequences: {*pi*} *M*−1 *<sup>i</sup>*=<sup>0</sup> *,*{*qi*} *M*−1 *<sup>i</sup>*=<sup>0</sup> . Let *P*[*m* + 1]*, Q*[*m/*2 + 1] and *N P*[*m* + 1]*, N Q*[*m/*2 + 1] be two 1D (integer, float) arrays to represent the corresponding elements, which are defined in the following.

#### **2.4.2 1DP Map**

The 1DP statistic histogram: for a sequence {*pi*} *M*−1 *<sup>i</sup>*=<sup>0</sup> , *N P, P* are two arrays (float, integer) with *(m* + 1*)* elements. The *j*-th elements *N P*[*j*]*, P*[*j*]*,* 0 ≤ *j* ≤ *m*, can be obtained from the following procedure:

> Initialization: ∀*N P*[*j*] = 0*.*0*, P*[*j*] = 0*,* 0 ≤ *j* ≤ *m*; Calculation: *f or(i* = 0;*i < M*;*i* + +*)*{*P*[*pi*] + +; } Normalization: *f or(j* = 0; *j* ≤ *m*; *j* + +*)*{*N P*[*j*] = *P*[*j*]*/M*; }

In the 1DP map, the PA unit corresponds to Initialization and Calculation; the CM unit handles Normalization.

#### **2.4.3 1DQ Map**

The 1DQ statistic histogram: for a sequence {*qi*} *M*−1 *<sup>i</sup>*=<sup>0</sup> , *N Q, Q* are two arrays (float, integer) with *(m/*2 + 1*)* elements. The *j*-th elements *N Q*[*j*]*, Q*[*j*]*,* 0 ≤ *j* ≤ *m/*2, can be obtained from the following procedure:

Initialization: ∀*N Q*[*j*] = 0*.*0*, Q*[*j*] = 0*,* 0 ≤ *j* ≤ *m/*2; Calculation: *f or(i* = 0;*i < M*;*i* + +*)*{*Q*[*qi*] + +; } Normalization: *f or(j* = 0; *j* ≤ *m/*2; *j* + +*)*{*N Q*[*j*] = *Q*[*j*]*/M*; }

Using *P, N P, Q, N Q* arrays, it is possible to generate the corresponding 1D statistical histograms as 1D maps.

In the 1DQ map, the PA unit corresponds to Initialization and Calculation; the CM unit handles Normalization.

#### **2.4.4 2D Measures**

The 2D measures specially process three measuring sequences: {*(pi*−<sup>1</sup>*, pi)*} *M*−1 *<sup>i</sup>*=<sup>0</sup> , {*(qi*−<sup>1</sup>*, qi)*} *M*−1 *<sup>i</sup>*=<sup>0</sup> *,*{*(pi, qi)*} *M*−1 *<sup>i</sup>*=<sup>0</sup> . Let *P*[*m* + 1 : *m* + 1]*, Q*[*m/*2 + 1 : *m/*2 + 1], *P Q*[*m* + 1 : *m/*2 + 1] be three 2D integer arrays to represent the corresponding elements, which are defined in the following.

#### **2.4.5 2DP Map**

2DP statistic histogram: for a sequence{*(pi*−<sup>1</sup>*, pi)*} *M*−1 *<sup>i</sup>*=<sup>0</sup> , *P* is a 2D integer array with *(m* + 1*)*<sup>2</sup> elements. The *i, j*-th elements *P*[*i, j*]*,* 0 ≤ *i, j* ≤ *m*, can be obtained from the following procedure:

> Initialization: ∀*P*[*i, j*] = 0*,* 0 ≤ *i, j* ≤ *m*; Calculation: *P*[*pM*−<sup>1</sup>*, p*0] + +; *f or(i* = 1;*i < M*;*i* + +*)*{*P*[*pi*−<sup>1</sup>*, pi*] + +; } Pseudo-color: Matching proper color ∀*P*[*i, j*]*,* 0 ≤ *i, j* ≤ *m*

In the 2DP map, the PA unit corresponds to Initialization and Calculation; the CM unit handles pseudo-color.

#### **2.4.6 2DQ Map**

2DQ statistic histogram: for a sequence {*(qi*−<sup>1</sup>*, qi)*} *M*−1 *<sup>i</sup>*=<sup>0</sup> , *Q* is a 2D integer array with *(m/*2 + 1*)*<sup>2</sup> elements. The *i, j*-th element *Q*[*i, j*]*,* 0 ≤ *i, j* ≤ *m/*2, can be obtained from the following procedure:

Initialization: ∀*Q*[*i, j*] = 0*,* 0 ≤ *i, j* ≤ *m/*2; Calculation: *Q*[*qM*−1*, q*0] + +; *f or(i* = 1;*i < M*;*i* + +*)*{*Q*[*qi*−1*, qi*] + +; } Pseudo-color: Matching proper color ∀*Q*[*i, j*]*,* 0 ≤ *i, j* ≤ *m/*2

In the 2DQ map, the PA unit corresponds to Initialization and Calculation; the CM unit handles Pseudo-color.

#### **2.4.7 2DPQ Map**

2DPQ statistic histogram: for a sequence {*(pi, qi)*} *M*−1 *<sup>i</sup>*=<sup>0</sup> , *P Q* is a 2D integer array with *(m* + 1*)* × *(m/*2 + 1*)* elements. The *i, j*-th elements *P Q*[*i, j*]*,* 0 ≤ *i* ≤ *m,* 0 ≤ *j* ≤ *m/*2, can be obtained from the following procedure:

Initialization: ∀*P Q*[*i, j*] = 0*,* 0 ≤ *i* ≤ *m,* 0 ≤ *j* ≤ *m/*2; Calculation: *f or(i* = 0;*i < M*;*i* + +*)*{*P Q*[*pi, qi*] + +; } Pseudo-color: Matching proper color ∀*P Q*[*i, j*]*,* 0 ≤ *i* ≤ *m,* 0 ≤ *j* ≤ *m/*2

In the 2DPQ map, the PA unit corresponds to Initialization and Calculation; the CM unit handles Pseudo-color.

Through the PCM module, five measuring sequences are transformed into two 1D arrays and three 2D arrays with *(m* + 1*), (m/*2 + 1*), (m* + 1*)*<sup>2</sup>*, (m/*2 + 1*)*<sup>2</sup> and *(m* + 1*)* × *(m/*2 + 1*)* clusters, respectively.

The final results of the variant map system are five maps: 1DP, 1DQ, 2DP, 2DQ, and 2DPQ as expected statistic distributions of the input 0–1 sequence.

#### **3 Sequence Analysis**

#### *3.1 Ideal Condition*

From a viewpoint of sequence analysis, it is a classical technology to sort the {*pi*} *M*−1 *i*=0 measuring sequence as a 1D statistic histogram. When the measuring sequence meets ideal conditions, the 1D statistical distribution is a binomial distribution.

**Lemma 1** *For an input 0–1 sequence, if the total number of segments is equal to M* = 2*m, and each segment of m bits appears only once in the sequence, then the 1DP array satisfies the binomial distribution:*

$$P[i] = \binom{m}{i}, 0 \le i \le m \tag{7}$$

**Corollary 1** *If the input sequence meets the conditions of Lemma 1, then the total number of items in the 1DP array is equal to*

$$\sum\_{i=0}^{m} P[i] = 2^{m} = M \tag{8}$$

**Lemma 2** *If the input sequence meets the conditions of Lemma 1, then the 1DQ array satisfies the following relation:*

$$\mathcal{Q}[i] = 2\binom{m}{2i}, 0 \le i \le \lfloor m/2 \rfloor\tag{9}$$

**Corollary 2** *If the input sequence meets the conditions of Lemma 1, then the total number of items in the 1DQ array is equal to*

$$\sum\_{i=0}^{m/2} \mathcal{Q}[i] = \mathcal{Z}^m = M \tag{10}$$

#### *3.2 General Condition*

**Theorem 1** *For any 0–1 sequence with N elements, a 2DP array has two projections in both vertical and horizontal directions and they are corresponding to the 1DP array.*

*Proof* A 2DP array is generated from a measuring sequence {*(pi*−<sup>1</sup>*, pi)*} *M*−1 *i*=0 and the 2DP array is {*P*[*i, <sup>j</sup>*]}*<sup>m</sup> <sup>i</sup>*=<sup>0</sup> *<sup>m</sup> <sup>j</sup>*=<sup>0</sup>, from both directions *<sup>P</sup>*[*i*] = *<sup>m</sup> <sup>j</sup>*=<sup>0</sup> *P*[*i, j*], <sup>0</sup> <sup>≤</sup> *<sup>i</sup>* <sup>≤</sup> *<sup>m</sup>*; *<sup>P</sup>*[*j*] = *<sup>m</sup> <sup>i</sup>*=<sup>0</sup> *<sup>P</sup>*[*i, <sup>j</sup>*]*,* <sup>0</sup> <sup>≤</sup> *<sup>j</sup>* <sup>≤</sup> *<sup>m</sup>*; so {*P*[*i*]}*<sup>m</sup> <sup>i</sup>*=<sup>0</sup> = {*P*[*j*]}*<sup>m</sup> <sup>j</sup>*=<sup>0</sup>. Both projections are the same 1DP array.

**Corollary 3** *For an arbitrary input sequence, the total number of items in the 2DP array is equal to*

$$\sum\_{i=0}^{m} \sum\_{j=0}^{m} P[i, j] = \sum\_{i=0}^{m} P[i] = M \tag{11}$$

**Theorem 2** *For any 0–1 sequence with N elements, a 2DQ projection in both directions is the 1DQ array.*

*Proof* A 2DQ array is generated from a measuring sequence {*qi*−<sup>1</sup>*, qi*} *M*−1 *<sup>i</sup>*=<sup>0</sup> and the 2DQ array is {*Q*[*i, <sup>j</sup>*]}*m/*2 *i*=0 *m/*2 *<sup>j</sup>*=<sup>0</sup> , from both directions *<sup>Q</sup>*[*i*] = *m/*2 *<sup>j</sup>*=<sup>0</sup> *Q*[*i, j*]*,* 0 ≤ *<sup>i</sup>* ≤ *m/*2; *<sup>Q</sup>*[*j*] = *<sup>m</sup> <sup>i</sup>*=<sup>0</sup> *<sup>Q</sup>*[*i, <sup>j</sup>*]*,* <sup>0</sup> <sup>≤</sup> *<sup>j</sup>* ≤ *m/*2; so {*Q*[*i*]}*m/*2 *<sup>i</sup>*=<sup>0</sup> = {*Q*[*j*]}*m/*2 *<sup>j</sup>*=<sup>0</sup> . Both projections are the same 1DQ array.

**Corollary 4** *For an arbitrary input sequence, the total number of items in the 2DQ array is equal to*

$$\sum\_{i=0}^{\lfloor m/2 \rfloor} \sum\_{j=0}^{\lfloor m/2 \rfloor} \mathcal{Q}[i, j] = \sum\_{i=0}^{\lfloor m/2 \rfloor} \mathcal{Q}[i] = M \tag{12}$$

**Theorem 3** *For any 0–1 sequence with N elements, a 2DPQ projection in two directions is corresponding to either a 1DP array or a 1DQ array, respectively.*

*Proof* A 2DPQ array is generated from a measuring sequence {*pi, qi*} *M*−1 *<sup>i</sup>*=<sup>0</sup> and the 2DPQ array is {*P Q*[*i, <sup>j</sup>*]}*<sup>m</sup> i*=0 *m/*2 *<sup>j</sup>*=<sup>0</sup> , from two directions *<sup>P</sup>*[*i*] = *m/*2 *<sup>j</sup>*=<sup>0</sup> *P Q*[*i, j*]*,* <sup>0</sup> <sup>≤</sup> *<sup>i</sup>* <sup>≤</sup> *<sup>m</sup>*; *<sup>Q</sup>*[*j*] = *<sup>m</sup> <sup>i</sup>*=<sup>0</sup> *P Q*[*i, j*]*,* 0 ≤ *j* ≤ *m/*2. So the two projections are corresponding to either a 1DP or a 1DQ array.

**Corollary 5** *For an arbitrary 0–1 input sequence, the total number of items in the 2DPQ array is equal to*

$$\sum\_{i=0}^{m} \sum\_{j=0}^{\lfloor m/2 \rfloor} P \, Q[i, j] = M = \sum\_{i=0}^{m} P[i] = \sum\_{j=0}^{\lfloor m/2 \rfloor} Q[j] \tag{13}$$

**Corollary 6** *For an arbitrary input sequence, five measuring sequences are corresponding to two 1D and three 2D arrays. Let* |*G*| *denote the number of associated possible clusters in G. If m >* 3*, then* |2*D P*| *>* |2*DPQ*| *>* |2*DQ*| *>* |1*D P*| *>* |1*DQ*| *is satisfied.*

*Proof* Five arrays: (2DP,2DPQ, 2DQ,1DP,1DQ) contain {*(m* + 1*)*<sup>2</sup>*, (m* + 1*)* × *(m/*2 +1*), (m/*2 + 1*)*<sup>2</sup>*, (m* + 1*), (m/*2 + 1*)*} items, respectively. If *m >* 3, then the inequalities are true.

#### *3.3 Brief Discussion*

From the listed statement in lemmas, theorems, and corollaries, Lemmas 1 and 2 described an ideal input sequence where each segment is a uniform distribution which appears only once. Under this ideal condition, both 1DP and 1DQ arrays are corresponding to a binomial distribution. Corollaries 1 and 2 have shown that both 1DP and 1DQ arrays meet the number of quantitative characteristics for the ideal input sequence.

Theorems 1 and 2 establish projective conditions on any input sequence. A 2DP or 2DQ array has its 1D projection of two directions on the same array. Theorem 3 claims that for any 2DPQ array, two projections are corresponding to both 1DP and 1DQ arrays, respectively.

Corollaries 3 and 4 treat 2DP and 2DQ arrays, respectively, in the total number of summing conditions on their quantitative characteristics. Corollary 5 is associated with Theorem 3 on a 2DPQ array to share with other four projections the same quantitative characteristics. In Corollary 5, the total number of each component on five statistic arrays is equal to the total number of segments *M*, a 2DPQ array occupies a central position in the projection to other arrays. Corollary 6 uses inequalities to show five scales of numbers of items in five arrays to provide the maximal number of items involved in the structure.

From a viewpoint of complex stochastic sequence analysis, this partition mode corresponds to the maximum number of clusters distinguished in the condition of multiple segments. Different from surface analysis based on the multivariate Gaussian probability distribution, variant maps provide only a limited finite number of lattice points that form space-related clusters on the projection position. Under the condition of segments in larger length, the 2DP array has the maximum number of distinct items and can be clearly distinguished among the five arrays to make the most visible map showing the largest refined distribution in details.

#### **4 Sample Maps**

Since the ideal distribution may appear merely on specific conditions, it is very difficult to use algebraic formulas to describe measuring sequences on statistical maps of an arbitrary cryptographic sequence. For complicated data sequences, the most effective scheme is using the computational approach directly to generate relevant maps and then to make feasible comparisons. Among the five maps generated from an input 0–1 sequence, more 2DP maps are selected in this section to illustrate a series of changes among segment lengths and shifting lengths for refined details.

In this section, one cryptographic sequence generated from an AES cipher is selected as a sample sequence, and various control parameters will be changed. This sample sequence has a fixed length *N* = 10<sup>6</sup> in one million stochastic bits. Various changes are made on the length *m* of segment and shift displacement *r*. Five maps will be applied to show their special statistical distributions.

# *4.1 Dramatically Changing the Segment Lengths: 1DP, 1DQ, 2DP, 2DQ, and 2DPQ Maps m* **= {8***,* **16***,* **128}***, r* **= 0**

Three groups of Figs. 3, 4, and 5 are involved in comparison based on the five maps.

In Fig. 3, nine maps from both 1DQ and 2DQ forms are selected in *m* = {8*,* 16*,* 128}*,r* = 0 condition; (a)–(c) showing three 1DQ maps with different segments; (d)–(f) showing 2DQ maps in normal sizes and (g)–(i) being the same 2DQ maps with enlarged sizes.

In Fig. 4, 12 maps from 1DP, 2DPQ, and 1DQ forms are selected in *m* = {8*,* 16*,* 128}*,r* = 0 condition; (a)–(c) showing three 1DQ maps with differ-

**Fig. 3** 1DQ and 2DQ maps on *m* = {8*,* 16*,* 128}*,r* = 0; **a**–**c** 1DQ maps; **d**–**f** 2DQ Regular maps; **g**–**i** 2DQ Enlarged maps

ent segments; (d)–(f) showing 2DPQ maps in normal sizes; (g)–(i) being the same 2DPQ maps with enlarged sizes and (j)–(l) illustrating 1DQ maps for convenient comparison.

In Fig. 5, nine maps from both 1DP and 2DP forms are selected in *m* = {8*,* 16*,* 128}*,r* = 0 condition; (a)–(c) showing three 1DP maps with different segments; (d)–(f) showing 2DP maps in normal sizes and (g)–(i) being the same 2DP maps with enlarged sizes.

# *4.2 Small Changes in Segment Lengths: 2DP Maps; Variation Series in Lengths of Segments m* **= {125***,* **126***,* **127}***, r* **= 0**

Two groups of maps are compared in Fig. 6 based on slightly changing segment lengths.

**Fig. 4** 1DP, 2DPQ, and 1DQ maps on *m* = {8*,* 16*,* 128}*,r* = 0; **a**–**c** 1DP maps; **d**–**f** 2DPQ Regular maps; **g**–**i** 2DPQ Enlarged maps; **j**–**l** 1DQ maps

In Fig. 6, nine maps from both 1DP and 2DP forms are selected in *m* = {125*,* 126*,* 127}*,r* = 0 condition; (a)–(c) showing three 1DP maps with different segments; (d)–(f) being 2DP maps in normal sizes and (g)–(i) showing the same 2DP maps with enlarged sizes.

**Fig. 5** 1DP and 2DP maps on *m* = {8*,* 16*,* 128}*,r* = 0; **a**-**c** 1DP maps; **d**–**f** 2DP Regular maps; **g**–**i** 2DP Enlarged maps

# *4.3 Changing the Lengths of Shift Displacement: 2DP Maps Change on Displacement Series m* **= 128***, r* **= {1***,* **2***,* **8}**

Two groups of maps are compared in Fig. 7 under changing shift lengths.

In Fig. 7, nine maps from both 1DP and 2DP forms are selected in *m* = 128*, r* = {1*,* 2*,* 8} condition; (a)–(c) showing three 1DP maps with different segments; (d)–(f) being 2DP maps in normal sizes and (g)–(i) showing the same 2DP maps with enlarged sizes.

**Fig. 6** 1DP and 2DP maps on *m* = {125*,* 126*,* 127}*,r* = 0; **a**–**c** 1DP maps; **d**–**f** 2DP Regular maps; **g**–**i** 2DP Enlarged maps

# *4.4 Enlarged Maps: 2DP Maps on m* **= {125***,* **127***,* **128}***, r* **= {0***,* **8}**

1DP maps are selected in both Figs. 8 and 9 on enlarged forms.

In Fig. 8, four maps from the 2DP form are selected in *m* = {125*,* 127*,* 128}*,r* = {0*,* 8} condition; (a) *r* = 0*, m* = 125; (b) *r* = 0*, m* = 127; (c) *r* = 0*, m* = 128, and (d) *r* = 8*, m* = 128. Four maps are showing the same 2DP maps on enlarged sizes.

In Fig. 9a and b, two maps of speckle patterns are selected from two distinct resources for comparison. (a) a larger map from the 2DP form is generated in *m* = 128*,r* = 0 condition; (b) a larger map of Fig. 1d is illustrated for a laser beam reflected from a plastic surface onto a wall. It is convenient for readers to observe the two speckle pattern maps in refined details.

**Fig. 7** 1DP and 2DP maps on *m* = 128*,r* = {1*,* 2*,* 8}; **a**–**c** 1DP maps; **d**–**f** 2DP Regular maps; **g**–**i** 2DP Enlarged maps

#### **5 Result Analysis**

#### *5.1 Figures 3, 4 and 5*

In Figs. 3, 4, and 5, six maps are listed on both 1DP (Figs. 4 and 5a–c) and 1DQ (Figs. 3a–c and 4j–l) forms, their distributions are generally corresponding to binomial coefficients. Under the changes of different lengths on segments, 1D maps are showing distributions of binomial patterns in the symmetric bell curves with the maximal value on the middle area.

From Figs. 3 and 5, six 2DQ maps (Fig. 3d–i) and six 2DP maps (Fig. 5d–i) are listed, when *m* = {8*,* 16}, significant regular distributions along both horizontal and vertical directions (Figs. 3d–h and 5d–h) appear as symmetric patterns. The central cluster is collected the largest number of measures located on the center point of relevant maps. But checking maps in Figs. 3f–i and 5f–i, regular patterns with the central symmetry are severely destroyed when the length of segments is increased to

**Fig. 8** 2DP larger maps on *m* = {125*,* 127*,* 128}*,r* = {0*,* 8}; **a** *r* = 0*, m* = 125 map; **b** *r* = 0*, m* = 127 map; **c** *r* = 0*, m* = 128 map; **d** *r* = 8*, m* = 128 map

*m* = 128. Regarding the two maps in Figs. 3f and 5f, both maps show circular disks with the central position at the highest number of collected measures. However, the two enlarged maps in Figs. 3i and 5i clearly show that significant speckle patterns are visualized around the central areas with stochastic higher numbers of measures. By comparing the two maps in Figs. 3i and 5i, Figure 5i provides much more visible asymmetry than Fig. 3i.

Because a 2DQ map covers only a quarter of a 2DP map, the damaging ratio of its symmetric properties appears much weaker than on the 2DP map. Applying a sufficiently larger segment length, central areas are observed with random speckle patterns and visible symmetric properties significantly damaged.

In general, it is feasible for a 2DP map to observe its middle areas in an approximately rotational symmetry in small sizes. But when the segment length is big enough, significant speckle patterns emerge in the central area with stronger stochastic properties.

In the 2DPQ maps of Fig. 4d–i, when *m* = {8*,* 16}, there appears a single central point as a key cluster to collect the maximal number with visible symmetrical patterns on the horizontal direction, but without symmetrical pattern on the vertical direction in Fig. 4d–h. However, when *m* = 128, the 2DPQ map of Fig. 4f appears as an irregular disk with higher values in the central area.

From the 2DPQ map of Fig. 4i, the enlarged map shows that stochastic speckle patterns appear in the central area with better horizontal symmetry than vertical direction with significantly damaged details.

#### *5.2 Figure 6*

In Fig. 6a–i, the nine maps are listed to show small changes on lengths of segments *m* = {126*,* 127*,* 128}. By checking the three 1DP maps in Fig. 6a–c, three middle areas appear slightly different from the bell shape: (a) left is higher than right; (b) right is higher than left; (c) right is higher than left and the middle one is lower than its nearest neighbors.

The three 2DP maps in (d)–(f) appear significantly as circular disks with an approximate symmetry and higher clusters around central areas. In the three enlarged 2DP maps in (g)–(i), there appear various speckle patterns in central areas.

Comparing the six maps of (a)–(c) and (g)–(i), speckle patterns in the three 2DP maps (g)–(i) are much easier identified than broken curving patterns in the three 1DP maps (a)–(c).

#### *5.3 Figure 7*

In Fig. 7a–i, the nine maps are listed to analyze changes of the parameters *m* = 128*,r* = {1*,* 2*,* 8}. By checking the three 1DP maps in Fig. 7a–c, middle areas of three maps appear slightly different from the regular bell shape: (a) left is lower than middle and middle is equal to right; (b) left and right are lower than middle, and right is higher than left; (c) left-middle-right are equal.

The three 2DP maps in (d)–(f) appear as similar circular disks with an approximate symmetry and higher clusters around central areas. In the three enlarged 2DP maps (g)–(i), there are various speckle patterns distinguishably placed in central areas.

Comparing the six maps of (a)–(c) and (g)–(i), distinguishable speckle patterns in the three 2DP maps (g)–(i) are much easier identified than broken curving patterns in the three 1DP maps (a)–(c).

#### *5.4 Figures 8–9*

In Fig. 8a–d, four enlarged 2DP maps are listed by using the parameters *m* = {125*,* 127*,* 128}*,r* = {0*,* 8}. Three maps (a)–(c) are created with *m* = {125*,* 127*,* 128}, *r* = 0 and two maps (c)–(d) with *m* = 128*,r* = {0*,* 8}. Four larger 2DP maps in (a)–(d) show stronger speckle patterns distinguishable in their central areas with significant distributions identified differently from mixed reflection and rotational effects.

In Fig. 9a–b, two enlarged maps of speckle patterns are selected. The map (a) with *m* = 128*,r* = 0 provides refined details to illustrate stochastic speckle patterns in the central area and the map (b) with *m* = 128*,r* = 8 has the same segment length, but a different shift length. The highest color clusters of the map (b) appear more compact and simpler than the highest color clusters of the map (a). The two maps are showing different speckle patterns as a result of simple geometric transformations.

By comparing the two enlarged speckle pattern maps, significant similarities and differences in details could be recognized.

#### **6 Conclusion**

For any 0–1 sequence with *N* elements, the variant map system processes multiple segments to transform each segment in a pair of measures. Using the cryptographic sequence generated from the AES cipher, five statistic maps were created. Two 1D maps show binomial distributions to which we refer as classical maps. Three 2D maps are constructed as variant maps. Selecting smaller segmented lengths, both classical and variant maps were illustrated in four groups. With larger segmented lengths increased, there are significant speckle patterns observed. From a brief comparison of the two larger maps, the enlarged 2DP maps in Fig. 9a, b show better refined visual details than other smaller maps.

For the 2DPQ map, there are significant horizontal symmetries observed, however, there is no reflection effect in the vertical direction.

From different 2DP maps with parameters *m* = {125*,...,* 128}, significant changes are observed: various speckle patterns are developed by both changes between lengths of segments and shift displacements. Enlarged maps are convenient to illustrate stochastic speckle patterns visibly. Some significant clusters are collected with speckle patterns associated to different control parameters in relevant maps.

From a viewpoint of system operation, two types of control parameters: length of segments and shift length of the sequence, provide an effective control mechanism to form clear speckle patterns on 2D distributions. It is necessary for us to put more attention on systematically exploring this type of issues, for refined researches on further directions.

The variant map system is different from both technologies: extracting information of speckle patterns to form random sequences and NIST 800-22 statistic testing package to use a single measurement of a P-value or a list of static parameters for evaluation. The variant framework provides five maps to identify complicated measurements through speckle patterns in details for any cryptographic sequence. Three refined 2D maps have more accurate properties than two 1D maps to describe nonlinear dynamic behavior as possible quantitative measurements.

In relation to the variant map system, future explorations on both theoretical foundation and key applications on cryptographic sequences are urgently required.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Stationary Randomness of Three Types of Six Random Sequences on Variant Maps**

**Jeffrey Zheng, Yamin Luo, Zhefei Li and Chris Zheng**

**Abstract** Various random streams have different stationary properties. It is necessary to use statistical probability and time series to evaluate quality of stationary randomness. In this chapter, a testing model is used on three maps for a random sequence. Multiple segments are divided on the shifted sequence as three measuring sets. For a map, the maxima are extracted and three maximal values are identified. 2D maps represent stationary randomness. Conditions of station random/stationary sequences are investigated. Testing sets are collected from three types of six random resources: AES, DES, A5, RC4, Australian National University (ANU), and University of Science and Technology of China (USTC) (two block ciphers, two stream ciphers, and two quantum ciphers). Six random sequences are selected. Measurements of stationary randomness are compared. There are only 0.0034–4.27% differences that are recognized. Using variation ratios, six samples are composed of three variation categories on {AES, DES}, {A5, RC4}, and {ANU, USTC}, respectively. From a measuring viewpoint, all six samples are showing distinguished stationary randomness properties.

J. Zheng Key Laboratory of Software Engineering of Yunnan, Yunnan University, Kunming, China e-mail: conjugatelogic@yahoo.com

Y. Luo e-mail: 1047668416@qq.com

Z. Li e-mail: 576167164@qq.com

This work was supported by the Key Project on Electric Information and Next Generation IT Technology of Yunnan (2018ZI002), NSF of China (61362014) and Yunnan Advanced Overseas Scholar Project.

J. Zheng (B) · Y. Luo · Z. Li · C. Zheng Key Laboratory of Quantum Information of Yunnan, Yunnan University, Kunming, China e-mail: conjugatelogic@yahoo.com

C. Zheng e-mail: z@caudate.me

**Keywords** Stationary randomness · Segment · Shifted sequence · Maxima Quantum sequence · Variation ratio

#### **1 Introduction**

In modern cyberspace environment [1], network communication technologies play the essential role to support advanced developments of science, technology, and social daily life in every aspect. From a security viewpoint of network communication, Communication Security (COMSEC) systems [2] are the most important part. Every COMSEC system depends on block cipher/stream cipher/hash technologies, and its core component is linked to a random number generator for any cryptographic applications.

Quantum satellite [3] using Quantum Key Distribution (QKD) systems [4] in cryptographic applications is the most advanced ICT development to establish ultrasecure quantum communications. For a QKD system, a truly random number generator [5], quantum random number generator, plays a key role.

From a reliable viewpoint, it is necessary to test stationary randomness degrees on shift operations in evaluations. In this section, a list of relevant schemes, pseudorandom/truly random sequences, P\_value, statistical probability distribution, optical statistics, stationary/nonstationary properties, and variant maps, are discussed.

#### *1.1 Pseudorandom Sequences from Linear Stream Ciphers*

Traditional stream ciphers [6] on Linear Feedback Shift Register (LFSR) structure (in military cryptography) are used as pseudorandom number generators, due to the ease of implementation from simple hardware, long periods, and uniformly distributed streams. The LFSR stream ciphers are the core in classical stream ciphers through the mathematical theory of algebraic functions for system simulation and analysis.

However, an LFSR is a linear system leading to fairly easy cryptanalysis using the Berlekamp–Massey algorithm. Important LFSR-based stream ciphers A5/1 & A5/2 are used in GSM cell phones and E0 is used in Bluetooth protocol. But from cryptanalysis viewpoint, the A5/2 cipher has been broken and both A5/1 and E0 have serious weaknesses [7, 8].

## *1.2 Pseudorandom Sequences from Nonlinear Stream Ciphers*

The new generation of stream ciphers [9, 10] is widely used in advanced cyber communications. Three general methods are applied to improve security weaknesses in LFSR-based stream ciphers:

1. **Nonlinear Functions**: Nonlinear combination of several bits from the LFSR state [11];


With batch a series of nonlinear algorithms are emerged [14]: nonlinear equivalence [15], evolutionary methods [12], AES cipher [16], RC4 [17], ZUC [11], cellular automata [18], and nonlinear dynamic system [19].

The new generation of stream ciphers has being shifted from the traditional mode: LFSR [6] to various nonlinear modes: NLFSR [20, 21], clock control [13], nonlinear functions [11], etc.; it is essential for ciphers to be integrated and implemented [22] to satisfy security models. However, different from LFSR with well-established linear mathematical theories and simulation tools, it is extremely difficult to use advanced nonlinear mathematical theories, recursive models, descriptive tools, and implementing schemes [19] in nonlinear dynamic environments. How to evaluate cryptographic sequences generated from the nonlinear stream ciphers is an urgent problem for modern stream/block ciphers.

#### *1.3 Truly Random Sequences from Hardware Devices*

In addition to pseudorandom sequences generated by stream ciphers, high-quality stochastic oscillators of truly random sequences are generated from special hardware devices such as laser photonics [23], nonlinear optics [24], quantum optics [25], quantum noises [26], thermal noise [27], and chaos and fractal nonlinear dynamics [28].

Since various truly random sequences are created from specific physical models with special principles and uncertain methodologies, it is extremely difficult for cryptographic researchers to make proper measurements explore nonlinear dynamic properties.

## *1.4 P\_value Schemes—Statistical Tests on Cryptographic Sequences*

Randomness has being explored for many years [29] on a series of statistic testing theories and methods. From a testing viewpoint, it is feasible to apply statistic testing packages to measure randomness properties on a given cryptographic sequence. NIST 800-22 package is a typical representative to provide more than 15 testing schemes for evaluation. Using the testing package, it is essential to check whether *P*\_value >0.01 for the sequence. Since such measuring scheme provides static property, it is difficult to use only *P*\_value parameter to express complex dynamic behaviors intrinsically involved in cryptographic sequences.

Since comprehensive behaviors in nonlinear dynamics may increase computational complexities tragically to involve complicated dynamic properties in the multivariate environment, those dynamic behaviors are completely ignored in *P*\_value schemes.

#### *1.5 Multiple Statistical Probability Distributions*

Measuring cryptographic sequences under segment conditions, multiple statistical probability schemes are useful to create various distributions to illustrate complex spatial relationships.

Multivariate normal probability distributions are the most important and powerful tool to test stochastic characteristics of a random data sequence [30] under the framework of probability, stochastic process, and statistics [31] for nonlinear problems. In this kind of measuring models, when a data sequence is sufficiently long, the high-dimensional probability distribution of the sequence [32] is converted into a continuous Gaussian distribution.

A typical projection model is shown in Fig. 1a; the central part shows a Gaussian surface with an unbalanced distribution in a 2D plane distributed as *P*(*X*, *Y* ) measures with pseudo-colors and two 1D projections shown in horizontal *P*(*X*) and vertical *P*(*Y* ) planes, respectively. In Fig. 1b, a standard Gaussian surface with

**(b) (c)**

symmetric shapes is illustrated and the 2D projection of its pseudo-color map is shown in Fig. 1c with continuous distribution of color on the map.

From sample figures, the relationship between the projection curve and two 1D Gaussian distributions are observed in the multivariate normal probability environment. Multivariate Gaussian probability distributions support various schemes to analyze complex stochastic data set of measuring sequences in many applications in continuous conditions.

#### *1.6 Photon Statistic in Quantum Optics*

Photon statistics is the theoretical and experimental approach on the statistical distributions in photon counting experiments to analyze the statistical nature of photons in a light source.

Three types of statistical distributions shown in Fig. 2 can be obtained by the light source [33]: Poissonian, super-Poissonian, and sub-Poissonian. The variance and average number of photon counts are identified for the corresponding distribution. Both Poissonian and super-Poissonian light are described by a semi-classical theory in which the light source is modeled as an electromagnetic wave and the atom is modeled by quantum mechanics. In contrast, sub-Poissonian light requires the quantization of the electromagnetic field for a proper description and is a direct measure of the particle nature of light.

#### *1.7 Stationary and Non-stationary Properties*

In mathematics and statistics, a stationary process is a stochastic process [34] whose joint probability distribution does not change when shift operations performed. Consequently, parameters such as mean and variance, if they are present, also do not change over time. Stationarity is an interesting property for many statistical procedures in time series analysis.

In 1938, Kolmogorov established the basic theorems for smoothing and predicting stationary stochastic processes [35, 36] that had major military applications during the Cold War.

In applied mathematics, the Wiener–Khinchin theorem [37–39] states that the Autocorrelation Function (ACF) of a wide-sense-stationary process has a spectral decomposition given by the power spectrum of the process. One of the effective ways identifying stationary times series is the ACF plot [40]. For a stationary time series, the ACF will drop to zero relatively quickly, while the ACF of nonstationary data decreases slowly [41].

#### *1.8 Datastreams*

#### **1.8.1 Pseudorandom Number Resources**

Four cryptographic sequences are selected: {AES,DES, A5, RC4}. For each cipher, a cryptographic sequence of 100MB data streams is collected.

{AES, DES} are block ciphers [16] on OFB mode to transfer block cipher output as a stream cipher stream.

A5/1 is a stream cipher [42] based around a combination of three LFSRs with irregular clocking.

RC4 is a stream cipher [43] designed by Ron Rivest in 1987. The design of RC4 avoids the use of LFSRs, its structure is ideal for software implementation, and it requires only byte manipulations.

#### **1.8.2 Two Quantum Random Number Resources**

Reliable and unbiased random numbers are important in cryptographic applications. Many algorithms can be used to generate pseudorandom numbers, but they can never be perfectly random or indeterministic.

Quantum random numbers can be generated from a physical quantum source of a coherent laser light to be splitting a beam of light into two beams and then measuring the power in each beam. Due to the light intensity in each beam, it fluctuates about the mean. Those fluctuations can be converted into a source of random numbers [44–46] being a stationary Poisson distribution.

Two quantum cryptographic resources are selected: {ANU, USTC}. For each quantum cipher, a truly random sequence of 1GB data streams is collected.

USTC resource: In the Key Laboratory of Quantum Information, USTC, CAS, true random number sequences are generated [45]. This type of true random sequences supports advanced quantum communication devices of QKD systems [47, 48].

More than 20GB quantum random number sequences are provided by USTC for randomness testing.

ANU resource: The ANU Quantum Random Numbers Server is an open website [49] to offer true random numbers to anyone on the Internet. Such random numbers are generated in real time by measuring the quantum fluctuations of the vacuum. The electromagnetic field of the vacuum exhibits random fluctuations in phase and amplitude at all frequencies. By carefully measuring these fluctuations, ultra-high bandwidth random numbers can be generated. Relevant data streams are downloaded.

#### *1.9 Variant Framework*

The conjugate classification [50] is proposed to apply seven measures in a hierarchy to partition the kernels of four regular plane lattices on *n* = {4, 5, 7, 9} cases for 2D binary images. For 1D cellular automata sequences, global random behaviors [51] are visualized in 2D maps.

Various schemes following the top-down strategy are explored to use multiple measures to partition special phase spaces from a top state set to multiple bottom states via multilevels of a hierarchy in combinatorial algorithms [52], image analysis, and processing for many years.

For *n*-tuple bit vectors, the variant logic framework [53] is proposed, and various applications are explored: 3D visual method on random number sequences [54], variant Pseudorandom Number Generator (PRNG) [55, 56], computational simulation on quantum interactions [57, 58], noncoding DNA analysis [59], and bat echolocation [60].

#### *1.10 Proposed Scheme*

For the convenience of testing stationary randomness on six cryptographic sequences, we propose a testing system for a stationary random sequence with length *N*; multiple segments *M* are divided from the sequence by a given length *m*; a 2-tuple pair of measures can be extracted from a 0–1 segment that is the number of 1 element and the number of 01 pattern in the segment. All paired measures are composed of a sequence of *M* pairs of measures as an ordered measuring set with *M* elements.

The pairs of the measuring sequence are directly separated as two independent measuring sequences to keep each parameter in the same order. A total of three sequences of distinct measures are constructed including two sequences on single measures and one sequence on 2-tuple measures.

Following this approach, two sets of single measuring sequences are sorted as two 1D numeric arrays as statistical histograms corresponding to 1D maps, and the 2 tuple measuring sequence is sorted as a 2D integer array as statistic histograms being a 2D map. Under the controlling operations on the changes of shift displacement, multiple results of the three measuring sequences are transformed into 1D statistic histograms and 2D pseudo-color maps to show effective patterns from the generated sequence under various positions and conditions on a list of shift operations.

#### *1.11 Organization of the Chapter*

This chapter describes a testing system for a stationary random sequence on diagrams of the system architecture and the core modules with input/output and processing functions in Sect. 2. In Sect. 3, the relationships among measuring sequences and the three statistical distribution maps are analyzed. In Sect. 4, four random sequences are generated from {AES, DES, A5, RC4} ciphers and two quantum cryptographic sequences collected from the Key Laboratory of Quantum Information, USTC, CAS, and ANU quantum number site. From the results of the visual maps in section IV, numeric analysis and brief comparison are carried out in Sect. 5. And finally in Sect. 6, the main results are summarized.

#### **2 Testing System**

To describe the testing system, diagrams are shown in Fig. 3.

**Fig. 3** The architecture of testing stationary random sequences

#### *2.1 System Architecture*

This system is composed of five parts: Input, Shifted Transformation (ST), Segment Measurement (SM), Combinatorial Projection (CP), and Output.

The input of the testing system is a selected 0–1 sequence, and its output is composed of three maps, two in 1D and one in 2D for visual distributions, and three maximals to be processed by ST, SM, and CP modules, respectively.

#### *2.2 Core Modules*

The testing system consists of three modules: {ST, SM CP}.

**Input**: *X N* = *m* ∗ *M* bit sequence; *m* segment length; *M* total segments; *r* shift length;

**Output**: Three maps {1DP, 1DQ, 2DPQ}; Three maximals {1DP*<sup>x</sup>* , 1DQ*<sup>x</sup>* , 2DPQ*x*} **Process**: Shifting *r* position from *X* to be *Y* = *X*(*r*) in ST. Making segment measuring sequences in SM and then projecting three measuring sequences as three maps and extracting three maximals in CP.

Let *X*, *Y* be 0–1 sequences with *N* elements, and the ST module takes the sequence *X* as input, then shift *r* position on the whole sequence to be the shifted sequence *Y* = *X*(*r*) (i.e., a cyclic shift right + or shift left −).

$$\begin{aligned} Y &= X(r), Y[I] = X[I \pm r], I \pm r (mod N), \\ &0 \le I < N; X[I], Y[I] \in \{0, 1\} \end{aligned} \tag{1}$$

In the SM module, the shifted vector is inputted and will be divided from a long sequence into *M* segments. For the *i*-th sub-vector, 0 ≤ *i* < *M* on the *j*-th position 0 ≤ *j* < *m*, denoted as *Yi*,*<sup>j</sup>* .

This sequence at the end of sub-vectors after the segmenting operation forms an *m* ∗ *M* matrix, *m* positions for the *i*-th complete row vector in the sequence correspond to a pair of 2-tuple measures: (*pi*, *qi*).

$$Y = \{Y\_i\}\_{i=0}^{M-1} \tag{2}$$

$$Y\_i = \{Y\_{i,0}, Y\_{i,1}, \dots, Y\_{i,j}, \dots, Y\_{i,m-1}\} \tag{3}$$

$$0 \le i < M, 0 \le j < m$$

$$Y\_i \to (p\_i, q\_i), 0 \le i < M \tag{4}$$

$$\{Y\_i\}\_{i=0}^{M-1} \to \{(p\_i, q\_i)\}\_{i=0}^{M-1} \tag{5}$$

The pair of 2-tuple measures (*pi*, *qi*) is determined by the following formula:

$$\begin{aligned} Y\_{i,j} &= Y[J] \in \{0, 1\}; J = i \ast m + j, \\ &0 \le i < M, 0 \le j < m, 0 \le J < m \ast M \end{aligned} \tag{6}$$

$$p\_i = \sum\_{j=0}^{m-1} Y\_{i,j}, Y\_{i,j} \in \{0, 1\}, 0 \le p\_i \le m;\tag{7}$$

$$q\_i = \sum\_{j=0}^{m-1} [(Y\_{i,j-1}, Y\_{i,j}) == (0, 1)],$$

$$j - 1 (mod \ m), 0 \le q\_i \le \lfloor m/2 \rfloor; \tag{8}$$

That is, *X* = 0011010010, *N* = 10, *M* = 2, *m* = 5;(*p*<sup>0</sup> = 2, *q*<sup>0</sup> = 1);(*p*<sup>1</sup> = 2, *q*<sup>1</sup> = 2).

The SM outputs the ordered *M* pairs of 2-tuple measures {*pi*, *qi*} *M*−1 *<sup>i</sup>*=<sup>0</sup> .

The CP module consists of two units: Split and projection. The split adapts the SM's output as the input, and the 2-tuple measuring sequence {(*pi*, *qi*)} *M*−1 *<sup>i</sup>*=<sup>0</sup> will be splitted into two independent measuring sequences:{*pi*} *M*−1 *<sup>i</sup>*=<sup>0</sup> ,{*qi*} *M*−1 *<sup>i</sup>*=<sup>0</sup> to keep the original order invariant.

Three measure sequences are {*pi*} *M*−1 *<sup>i</sup>*=<sup>0</sup> ,{*qi*} *M*−1 *<sup>i</sup>*=<sup>0</sup> ,{(*pi*, *qi*)} *M*−1 *<sup>i</sup>*=<sup>0</sup> .

The projection unit consists of three steps: Project Array (PA), Color Map (CM), and Get Maximal (GM). For three measuring sequences, two types of 1D and 2D measures will be processed separately.

The PA processes measuring sequences to transform them into integer arrays and the CM will organize them on either normalized histograms (1D measures) or color maps (2D measures), respectively.

The 1D measures involve two measuring sequences: {*pi*} *M*−1 *<sup>i</sup>*=<sup>0</sup> ,{*qi*} *M*−1 *<sup>i</sup>*=<sup>0</sup> . Let *P*[*m* + 1], *Q*[*m*/2 + 1] and *N P*[*m* + 1], *N Q*[*m*/2 + 1] be two 1D (integer, float) arrays to represent the corresponding elements.

The 1DP statistic histogram is generated from a sequence {*pi*} *M*−1 *<sup>i</sup>*=<sup>0</sup> , *N P*, *P* two arrays (floating point, integer) with (*m* + 1) elements. For the *j*-th element *N P*[*j*], *P*[*j*], 0 ≤ *j* ≤ *m*, and 1DP*<sup>x</sup>* the maximal element, the output can be obtained by following procedure:

$$\begin{array}{c} \text{Initialization: } \forall NP[j] = 0.0, \\ \qquad P[j] = 0, 0 \le j \le m; \\ \text{Calculation: } for(i = 0; i < M; i++) \\ \qquad \{P[p\_i]++; \} \\ \text{Normalization: } for(j = 0; j \le m; j++) \\ \qquad \{NP[j] = P[j]/M; \} \\ \text{Get Maximumal: } 1 \text{DP}\_x = \max\{NP[j] | 0 \le j \le m\} \end{array}$$

In the 1DP map, the PA corresponds to initialization and calculation; the MA handles normalization and the GM identifies the maximal element of the map.

The 1DQ statistic histogram is generated from a sequence {*qi*} *M*−1 *<sup>i</sup>*=<sup>0</sup> , *N Q*, *Q* two arrays (floating point, integer) with (*m*/2 + 1) elements. For the *j*-th element *N Q*[*j*], *Q*[*j*], 0 ≤ *j* ≤ *m*/2, and 1DQ*<sup>x</sup>* the maximal element, the output can be obtained from following procedure:

$$\begin{array}{c} \text{Initialization: } \forall N \, Q[j] = 0.0, \\ \quad \begin{aligned} \, &Q[j] = 0, 0 \le j \le \lfloor m/2 \rfloor; \\ \text{Calculation: } for(i = 0; i < M; i++) \\ &\{ \, \kern-1.0pt{ $\scriptstyle$ \scriptstyle $\scriptstyle$ Q[q\_i] ++  $+$ }; \\ \text{Normalization: } for(j = 0; j \le \lfloor m/2 \rfloor; j++) \\ &\{ N \, \kern-1.0pt{ $\scriptstyle$ } \} = \mathcal{Q}[j]/M; \} \\ \text{Get Maximum: } \text{IDQ}\_x = \max\{ N \, \kern-1.0pt{ $\scriptstyle$ } \} 0 \le j \le \lfloor m/2 \rfloor \} \end{aligned} $$

Using *P*, *N P*, *Q*, *N Q* arrays, it is possible to generate corresponding 1D statistical histograms as 1D maps.

In the 1DQ map, the PA corresponds to initialization and calculation; the MA handles normalization and the GM identifies the maximal element of the map.

The 2D measures specially processes one measuring sequence: {(*pi*, *qi*)} *M*−1 *<sup>i</sup>*=<sup>0</sup> . Let *P Q*[*m* + 1 : *m*/2 + 1] be a 2D integer array.

2DPQ statistic histogram is generated from a sequence{(*pi*, *qi*)} *M*−1 *<sup>i</sup>*=<sup>0</sup> , *P Q* a 2D integer array with (*m* + 1) ∗ (*m*/2 + 1) elements; For the *i*, *j*-th element *P Q*[*i*, *j*], 0 ≤ *i* ≤ *m*, 0 ≤ *j* ≤ *m*/2, and 1DPQ*<sup>x</sup>* the maximal element, their values can be obtained by following procedure:

$$\begin{array}{c} \text{Initialization: } \forall P \, Q[i, j] = 0, \\ 0 \le i \le m, 0 \le j \le \lfloor m/2 \rfloor; \\ \text{Calculation: } for(i = 0; i < M; i++) \\ \{ P \, \underline{Q}[p\_i, q\_i] ++; \} \\ \text{Pseudo-color: Matching proper color for} \\ \forall P \, \underline{Q}[i, j], 0 \le i \le m, 0 \le j \le \lfloor m/2 \rfloor \\ \text{Get Maximum: } 1 \text{DPQ}\_x = \max \{ P \, \underline{Q}[i, j] | 0 \le i \le m, \\ 0 \le j \le \lfloor m/2 \rfloor \} \end{array}$$

In the 2DPQ map, the PA corresponds to initialization and calculation; the MA handles pseudo-color and the GM identifies the maximal element of the map.

Through the CP module, three measuring sequences are transformed into two 1D arrays and one 2D array with (*m* + 1), (*m*/2 + 1) and (*m* + 1) ∗ (*m*/2 + 1) clusters.

The outputs of the testing system are three maps {1DP, 1DQ, 2DPQ} and three maximals {1DP*<sup>x</sup>* , 1DQ*<sup>x</sup>* , 2DPQ*x*} as expected statistic distributions and representatives of the input 0–1 sequence, respectively.

#### **3 Association Analysis**

It is a counting scheme to sort the {*pi*} *M*−1 *<sup>i</sup>*=<sup>0</sup> measuring sequence as a 1D histogram. When the measuring sequence meets ideal conditions, the 1D statistical distribution is a binomial distribution.

**Lemma 1** *For an input 0–1 sequence, if the total number of segments is equal to M* = 2*m, and each segment of m bits appears only once in the sequence, then the 1DP array satisfies the binomial distribution*

$$p[i] = \binom{m}{i}, 0 \le i \le m \tag{9}$$

**Corollary 1** *If the input sequence meets the conditions of Lemma 1, then the total number of items in the 1DP array is equal to*

$$\sum\_{i=0}^{m} p[i] = 2^m = M \tag{10}$$

**Lemma 2** *If the input sequence meets the conditions of Lemma 1, then the 1DQ array satisfies following relation:*

$$\mathcal{Q}[i] = 2\binom{m}{2i}, 0 \le i \le \lfloor m/2 \rfloor \tag{11}$$

**Corollary 2** *If the input sequence meets the conditions of Lemma 1, then the total number of items in the 1DQ array is equal to*

$$\sum\_{i=0}^{m/2} \mathcal{Q}[i] = \mathcal{Z}^m = M \tag{12}$$

**Corollary 3** *For any 0–1 sequence with N elements, a 2DPQ projection in two directions is corresponding to either a 1DP array or a 1DQ array, respectively.*

*Proof* A 2DPQ array is generated from a measuring sequence {*pi*, *qi*} *M*−1 *<sup>i</sup>*=<sup>0</sup> and the 2DPQ array is sorted by {*P Q*[*i*, *<sup>j</sup>*]}*<sup>m</sup> i*=0 *m*/2 *<sup>j</sup>*=<sup>0</sup> , from two directions *P*[*i*] = *m*/2 *<sup>j</sup>*=<sup>0</sup> *P Q*[*i*, *<sup>j</sup>*], 0 <sup>≤</sup> *<sup>i</sup>* <sup>≤</sup> *<sup>m</sup>*; *<sup>Q</sup>*[*j*] = *<sup>m</sup> <sup>i</sup>*=<sup>0</sup> *P Q*[*i*, *j*], 0 ≤ *j* ≤ *m*/2. So two projections are corresponding to an either 1DP or 1DQ array.

**Corollary 4** *For an arbitrary 0–1 input sequence, the total number of items in the 2DPQ array is equal to*

$$\sum\_{i=0}^{m} \sum\_{j=0}^{\lfloor m/2 \rfloor} P \, Q[i, j] = \sum\_{i=0}^{m} P[i] = \sum\_{j=0}^{\lfloor m/2 \rfloor} Q[j] = M \tag{13}$$

In Corollaries 3 and 4, the total number of each component on three statistic arrays is equal to the total number of segments *M*, and the 2DPQ array occupies a central position in the projection to other two arrays.

Let {1DP*<sup>x</sup>* (*r*), 1DQ*<sup>x</sup>* (*r*), 2DPQ*<sup>x</sup>* (*r*)} denote three maximals on the selected sequence for 0 <sup>≤</sup> *<sup>r</sup>* <sup>≤</sup> *<sup>m</sup>*; three maximal sequences are {1DP*<sup>x</sup>* (*r*)}*<sup>m</sup> <sup>r</sup>*=0, {1DQ*<sup>x</sup>* (*r*)}*<sup>m</sup> <sup>r</sup>*=0, {2DPQ*<sup>x</sup>* (*r*)}*<sup>m</sup> <sup>r</sup>*=0.

For a 0–1 sequence with *M* segments, if each segment of *m* bits is composed of a state and only one state is involved, then the sequence is a circular sequence.

**Lemma 3** *For a sequence* 0 ≤ *r* ≤ *m, the sequence is a circular sequence, iff 1DPx* (*r*) = *1DQx* (*r*) = 1 *and 2DPQx* (*r*) = *M.*

*Proof* For a circular sequence, shift operations do not change the pair of measures, only a single (*p*, *q*) value is possible.

**Theorem 1** *For a sequence with stationary random properties, it has 1DPx* (0) ··· *1DPx* (*r*) ··· *1DPx* (*m*) 1*, 1DQx* (0) ··· *1DQx* (*r*) ··· *1DQx* (*m*) 1*, or 2DPQx* (0) ··· *2DPQx* (*r*) ··· *2DPQx* (*m*) 1*.*

*Proof* In any random condition, it is necessary for pairs of {(*p*, *q*)} to have certain states significantly different from a circular sequence in either 1 or *M* condition. Under the stationary random condition, all maximals satisfy only relations under shift operations.

For a *G* map, let *Gx* be an average variation, Δ*Gx* be a region of variations, and *G<sup>R</sup> <sup>x</sup>* = Δ*Gx*/*Gx* be a variation ratio.

**Theorem 2** *For two* {*i*, *j*}*-th G maps G<sup>i</sup> and G <sup>j</sup> on G<sup>i</sup> <sup>x</sup> <sup>G</sup> <sup>j</sup> <sup>x</sup> with variation ratios Gi*,*<sup>R</sup> <sup>x</sup> and G <sup>j</sup>*,*<sup>R</sup> <sup>x</sup> , if a variation ratio has a minimal value, then the relevant map has a better stationary random property than the maximal one.*

*Proof* Since *G<sup>R</sup> <sup>x</sup>* = Δ*Gx*/*Gx* and *G<sup>i</sup> <sup>x</sup> <sup>G</sup> <sup>j</sup> <sup>x</sup>* , it is a relative measure on ∀*r*(*max*{*Gx* (*r*)} − *min*{*Gx* (*r*)})/*Gx* ≥ 0. So *min*{Δ*G<sup>i</sup> <sup>x</sup>* , Δ*<sup>G</sup> <sup>j</sup> <sup>x</sup>* } ≤ *max*{Δ*G<sup>i</sup> x* , Δ*G <sup>j</sup> <sup>x</sup>* }, the minimal variation ratio indicates the better stationary random property.

**Corollary 5** *For different maps, it is better to compare various variation ratios relevant to the same type of distributions.*

*Proof* For various maps in the same type of distributions, relevant{*Gx* }should satisfy the similar–equal condition.

#### **4 Testing Results**

Four pseudorandom sequences are generated by {A5,RC4,DES, AES} ciphers, and two quantum cryptographic sequences are selected from both ANU and USTC resources.

**Fig. 4** Six cryptographic sequences on *r* = 32 1DP, 2DPQ, and 1DQ maps

**Fig. 5** Six cryptographic sequences on *r* = 32 2DPQ maps

Typical results of testing stationary properties for six sequences on 18 maps of {1DP, 2DPQ, 1DQ} are shown in Fig. 4. Each position contains nine shift values of *r* = 32 selected. A total number of 18 maps are included. Six 2DPQ maps are shown in Fig. 5 as enlarged maps. Each map has shift values of *r* = 32, respectively.

Three variation measures {*Gx* , Δ*Gx* , *G<sup>R</sup> <sup>x</sup>* } for maps {1DP, 2DPQ, 1DQ } of six sequences are shown in Table 1, and their sorted orders are listed in Table 2. Twentyfour 2D maps of maximal curves for *r* = 0 − 128 are shown in Table 3. Three left columns contain 18 enlarged variation maps of {1DQ, 1DP, 2DPQ} and the last column contains six variation regions of 1DQ + 1DP + 2DPQ in six 2D maps. Six enlarged 2D maps are shown in Table 4 and six larger 2D maps are shown in Table 5.

In Table 6, 49 pairs of differences for variation ratios are listed in three 7 × 7 tables to illustrate refined quantity measures on three levels. There are seven entries on diagonals with seven trivial 0 values. For other 42 nontrivial values, let *dG<sup>R</sup> <sup>x</sup>* % denote differences of *G<sup>R</sup> <sup>x</sup>* % based on the basic variation ratios in Table 1, and various differences of variation ratios among six samples are listed. Differences of three variation ratios {*d Q<sup>R</sup> <sup>x</sup>* %, *d P<sup>R</sup> <sup>x</sup>* %, *dPQ<sup>R</sup> <sup>x</sup>* %,} on seven items {∅, AES, DES, A5, RC4, ANU, USTC} are illustrated.

## **5 Result Analysis**

**Table 1** Comparisons on three variation measures for

six samples

Eighteen maps in Fig. 4 are composed of three groups. Six 1DP maps have similar distributions in bell shapes to illustrate Poissonian distributions. Six 2DPQ maps are



**Table 2** Possible sorted orders of three types of variation measures; (a) *Gx*%, (b) Δ*Gx*%, (c) *G<sup>R</sup> <sup>x</sup>* %

2D distributions. They have a symmetry on left/right directions and have a broken symmetry on up/down directions. Pseudo-color pixels on six maps indicate relevant 3D shapes. Compared with six 1DP maps, six 1DQ maps have similar distributions and more narrow bell shapes to illustrate sub-Poissonian distributions. It is possible to illustrate different maps on shift *r* = 32 for each map.

In Table 1, three pairs of maximal and minimal variation ratios are identified and three full orders are sorted in Table 2. Compared with *Gx* sorted orders, both {Δ*Gx* , *G<sup>R</sup> <sup>x</sup>* } variation ratios, six samples keep the same sorted orders as two groups: 1DQ and {1DP, 2DPQ} for their min-max variation ratios. Six enlarged 2DPQ maps on shift *r* = 32 are shown in Fig. 5 to form three pairs {AES:DES, RC4:A5, ANU:USTC}. Three pairs of six maps have similar visual distributions.

Twenty-four variation maps are shown in Table 3 as four groups. Each group contains six 2D maps. For three groups of {1DQ, 1DP, 2DPQ}variation distributions, eighteen enlarged 2D maps are shown in significant waveforms. For the group of 1DQ + 1DP + 2DPQ distributions, six maps are shown in three average variations satisfying 1*DQx* > 1*D Px* > 2*DPQx* , respectively. The fourth group of variation measures combines three variations of 1DQ + 1DP + 2DPQ in one unified 2D maps. From the six 2D maps, their stationary randomness of global variations are clearly illustrated.

In Table 4, AES and DES map may have high frequent waves, and other enlarged 2D maps have stationary properties. In Table 5, larger waves appear and more details could be identified. Although significant variations are appeared in different 2D maps, it is difficult to make classification depending on their variation behaviors.

**Table 3** Variation distributions of six samples

In Table 6, three variation ratios of differences are bounded in 0.0034 ≤ |*d Q<sup>R</sup> <sup>x</sup>* %| ≤ 1.73, 0.056 ≤ |*d P<sup>R</sup> <sup>x</sup>* %| ≤ 3.96, and 0.073 ≤ |*dPQ<sup>R</sup> <sup>x</sup>* %| ≤ 4.27, respectively. In general, three groups of variation ranges on differences meet {*d Q<sup>R</sup> <sup>x</sup>* %}⊂{*d P<sup>R</sup> <sup>x</sup>* %} ⊂ {*dPQ<sup>R</sup> <sup>x</sup>* %}. From a stationary testing viewpoint, 2DPQ shows the strongest distinct property, 1DQ has the weakest numeric property, and 1DP provides the middle identifying property.

Since three groups can be identified by {AES, DES} block ciphers, {A5, RC4} stream ciphers, and {ANU, USTC} quantum ciphers, stationary randomness quantities can be classified as three {AES, DES}-highest, {A5, RC4}-middle, and {ANU, USTC}-lowest categories to provide distinct variation measures in the testing. Three quantity categories may correspond to distinguish artificial, semi-artificial, and natural designs for various generating mechanisms of cryptographic resources.

Considering all differences of variation ratios on six samples listed in Table 6, there are only 0.0034–4.27% differences (thirty-four in one million to four percent) are recognized. From a measuring viewpoint, all six samples are showing distinct stationary randomness properties.


**Table 6** Differences of variation ratios among three maximals of six samples

#### **6 Conclusion**

It is feasible to evaluate stationary properties for a random sequence using the testing system. Using three maps {1DP, 1DQ, 2DPQ}, a series of variation measures and their ratios are illustrated. Extracting maximal measures is identified for shift *r* : 0 − *m*. For each sample, three 2D maps of variation curves provide refined characteristics to evaluate stationary randomness properties in global. Sample variation maps are shown in exactly similar–equal relationships among the same group of average variations. Further explorations and applications are required to check the testing system on other applications of cryptographic streams. Three quantity categories of artificial, semi-artificial, and natural designs may be explored to get intrinsic stationary randomness information from refined testing and future explorations.

**Acknowledgements** Thanks to National Science Foundation of China (61362014) and High Level Overseas Professional Project of Yunnan Province for financial supports to this project. Thanks to the Key Laboratory of Quantum Information, USTC, CAS and the ANU Quantum Random Numbers Server for quantum cryptographic sequences.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Part IV Theoretical Foundation—Meta Model

TAO produced the First—[Heaven]. The First produced the Second—[Earth]. These Two produced the Third. The Third produced all things, and these turn their back upon the Yin and embrace the Yang. The intermingling of these two Afflati results in harmony. —Lao Tzu (Tao Te Ching)

Knowledge has the form of a tree, and since metaphysics is the most fundamental one of the theoretical disciplines, it represents the roots of the tree.

—Gonzalo Rodriguez-Pereyra

Meta-design is much more difficult than design; it's easier to draw something than to explain how to draw it.

—Donald Knuth

From a historical viewpoint, the meta model was developed early than variant logic that provides useful concept and hierarchical organization to support this new logic framework. The core paper of concept cell (Concept Cell Model for Knowledge Representation) was published in Int. J. Inf. Acquisition 01, 149–168 (2004), World Scientific Press. In relation to multiple probability approach, a research paper (Voting Theory for Multiple Candidates to Resolve Intrinsic Uncertain Problems of Election) was published in Journal of System Engineering Theory and Practices (Chinese) 1000-6788(2002)12-0101-10. This paper proposed a useful multiple probability model to resolve intrinsic uncertain properties in election.

Part IV is composed of two chapters (9 and 10).

Chapter "Meta Model on Concept Cell" outlines a meta model on concept cell for knowledge representation to provide a brief core structure on this network topology scheme for three levels of knowledge clusters.

Chapter "Voting Theory for Two Parties Under Approval Rule" describes voting theory for two parties under approval rule to show multiple probability model also useful in two-party conditions.

# **Meta Model on Concept Cell**

**Jeffrey Zheng and Chris Zheng**

**Abstract** Applying network topology schemes, two types of three levels of meta knowledge representations have been established. This chapter proposes a meta model on concept cell that provides a meta organisation of knowledge in natural and artificial intelligent systems structurally.

**Keywords** Knowledge model · Meta representation · Three levels of concept lattice · Description · Procedure · Core organisation

#### **1 Introduction**

A meta model on concept cell is outlined to represent knowledge in knowledge systems (KSs). This model has novel features that are of considerable interest for knowledge representation (KR).

Polanyi proposed a knowledge model in the 1940s. Knowledge is composed of two categories: tacit and explicit [1, 2]. In the 1970s, Anderson from a cognitive psychology identified knowledge with another two categories: declarative and procedural [3–5]. In the early 1990s, a procedural model was proposed by Nonaka who identified four transformations: tacit → tacit (socialisation), tacit → explicit (externalisation), explicit → explicit (combination) and explicit tacit (internalisation) [6, 7]. In 2000, a model was proposed by Nickols to arrange four classes (tacit, explicit, procedural and declarative) into three categories: tacit, explicit and implicit. In my opinion, the Nickols model is unsatisfactory for three reasons:

J. Zheng (B)

This work was supported by Australian Commercialising Emerging Technologies, (COMET) program.

Key Laboratory of Software Engineering of Yunnan, Yunnan University, Kunming, China e-mail: conjugatelogic@yahoo.com

C. Zheng Tahto, Sydney, Australia e-mail: z@caudate.me


To improve the first two weaknesses of Nickols approach, an executable knowledge model was proposed. A triplet (tacit, implicit and explicit) is constructed as a procedural structure. Implicit in it is the middle node linked with two other nodes in four transformations: tacit → implicit (externalisation), implicit → explicit (retrieval), explicit → implicit (category) and implicit → tacit (internalisation). In addition, the model provides distinguishable foreground/background and human/ machine knowledge interfaces [8].

To explore different KS applications from philosophy, logic and digital libraries, to gene, chemistry, software and system engineering [9–11], people arrange common concepts to construct ontology libraries and procedures as core structures [12, 13]. Advanced system modelling tools such as ARIS [14], CIMOSA [15] and IDEF [16] provide function, data and process models and ontology description capture methodologies for constructing modern intelligent knowledge systems [17]. Because many contradictions, confusions, difficulties and unclear properties exist in KR foundation levels [13, 18, 19], consistently categorising practical knowledge into tacit/explicit and procedural/declarative is extremely hard for researchers, scientists, philosophers, psychologists and knowledge workers [14–17, 20, 21].

Practical computer-aided modelling systems use pragmatic approaches to manipulate simple structures (list, tree, stack, class and component) in real applications [14–17, 21]. Usually, declarative concepts seem easier to capture than procedural concepts. Based on this, many people believe that declarative knowledge is explicit and procedural knowledge is tacit [16, 17, 22]. A radical extension of a knowledge model in KR is proposed in a concept cell that arranges knowledge in KS for natural and artificial organisation. This model can fully support the above-mentioned knowledge models to consistently identify four categories of knowledge: tacit, explicit, declarative and procedural. The model also provides a core ontology to distinguish a hierarchy of structures within the core of a concept. According to convention, the word concept is used as an equivalent to knowledge in this chapter.

#### **2 Concept Cell Model**

Let K denote a cell of concepts (a concept cell) that is composed of three parts: M membrane, N nuclei and G gel. M is a frame that provides a container to hold both N and G. G is a base description of the content and N establishes a foundation of the cell. M inputs provide external concepts (externals) for N from deeper levels, and then output current content to other upper level cells. N is composed of two components: D declarative nucleus and P procedural nucleus. To illustrate this organisation, a cell K = M, N, G is shown in Fig. 1.

slice, **b** hierarchy

For the convenience of construction, a special lattice is employed [23]. Only directed graphs are used similar to the most popular signal flow graphs [24] to analyse and syntheses process control [9], computer architecture [10], electric circuits [25], network topology [26–28] and dynamic systems [25, 29]. However, no lattices allow containing a loop and all lattices are composed of directed acyclic graphs [26, 28]. In a lattice, a node represents a cell and lattice links are determined by dependencies among nodes. Because the most complex part of a cell is its nuclei structures, detailed interior organisation is necessary to explore meanings of knowledge. To simplify, a simple cell (or a cell, if there is no confusion) is studied here, where nuclei of the cell are composed of only one declarative lattice and one procedural lattice.

Using lattice language, a cell K is described in Fig. 2. Different graphic symbols represent distinct forms of concepts as nodes. A rounded rectangle represents a general node; an octagon is a specific node; a rectangle shows a declarative node and an oval corresponds to a procedural node. A simple lattice cell is composed of

**Fig. 2** A concept cell in lattices

four levels: A node M that interfaces between externals and internals are the first level. Two nodes of G and N link with an M node is the second level. The node G contains the base description and the node N plays a foundation role in the cell. Two nodes of D and P link with node N on the third level. Node D contains one lattice in declarative dependency and node P contains one lattice that assumes procedural dependency. Finally, two sets of nodes linked with nodes D and P at the fourth level. Each node of D or P contains four nodes, respectively. Among each four nodes, two links are associated with three nodes.

#### **3 Core Components**

The following four conditions can create the content of a concept cell:


An N external corresponds to a D node. A declarative dependency is employed to order all nodes of D as a declarative lattice. If two distinct nodes have declarative dependency, then the node with more general meanings is located at the first node and a declarative link connects from the first to the second. After building up declarative dependency among all nodes, D becomes a directed acyclic lattice.

Instances of an N external correspond to nodes of P satisfying procedural dependency. P is composed of sequences of nodes by instances of externals. If two instances represent two nodes, then the node that has to be handled earlier is specified as the first node and a procedural link connects two nodes from the first to the second. After all procedural dependencies are established among nodes, P is converted into a directed acyclic lattice.

(iv) Two lattices are composed of eight distinguishable node sets:

Four sets of declarative nodes C, T, I, E are identified: C core, T tacit, I implicit and E explicit, respectively.

Four sets of procedural nodes L, S, O, F are identified: L life cycle, S start, O operation and F finish.

The meanings of the construction process can be explained as: In the first level of kernel, M collects all externals to provide extra knowledge for its nuclei. The second level has two parts: G, N. The G node provides the base description. To map each external as a node, the number of N externals has the same number of nodes in D. A declarative dependency is valid for all D nodes that create a directed acyclic declarative lattice. Using instances of N externals as nodes, P has been assembled using procedural dependency linked with selected nodes and finally to form P itself as procedural lattice. Since both declarative and procedural lattices are organised by ordered dependencies, declarative and procedural lattices are directed acyclic to support wider requirements from theoretical foundations to practical applications. A simple construction example is shown in Fig. 3(i–v).

For an acyclic lattice, four distinct node sets are notable in Fig. 3(vi). They are (singleton, source, branch and sink) node sets, respectively, borrowed from network topology [23, 26, 30]. A singleton node provides an isolated concept. A source node exports a concept. A sink node imports concept(s) and a branch node transfers concept(s) from input link(s) to output link(s). If there is only one external in N, then the singleton set contains one single node and the other three sets are empty. If there is more than one node in N, then the singleton set is empty. In this case, the source set is composed of nodes that have at least one link to another node; however, a source node does not have a link from other nodes. Each node must have at least one in branch, or sink set consequently. In contrast to the source set, a sink set collects all nodes with links from other nodes, without a link to a node. A sink node has to be the last node in a node path of a lattice to which at least one node is linked, from branch or source set. Unlike source and sink sets, a node in a branch set may link with at

**Fig. 3** External concepts, declarative and procedural lattices and node sets

least two nodes to and from source, sink, and branch sets. A branch node receives from other node(s) and outputs to other node(s). These sequence nodes provide connectivity among nodes. Although four node sets can be identified by their different connectivity, it is not convenient to use the same vocabulary to describe two distinct lattices under different dependencies. For convenience, each node set includes a proper name to indicate its specific relationship in familiar KR terms. D lattice represents an invariant structure (the simplest cases: tree, list) similar to a traditional data structure hierarchy. Because a sink node is equivalent to a factor data at the leaf level (at the lowest location) of data structure, the sink node has to be represented as an explicit knowledge. Therefore, the sink set of D is explicit. In contrast, a source node provides invaluable knowledge from the highest level of externals. There is no link to this node and anyone wanting to explain the meaning of the node must capture knowledge from other sources far beyond the node itself. Consequently, a source node always contains deeper meanings than those can be articulated. Hence, the source set of D is tacit. Different from sink and source sets, a node in a branch set has connectivity from higher tacit node(s) and to higher explicit node(s). The branch set of D represents a typical intermediate property. Consequently, the branch set of D is implicit. A singleton node provides a complete concept. The node itself is the central of the D lattice. Therefore, the singleton node set of the D lattice is a core. Four node sets of P lattice satisfy different properties. The P lattice has a close relationship to process modelling that provides a time arrow as controllable sequences. A node in the P lattice is an instance of a node in the D lattice. The singleton node set of the P lattice is not empty if only one node is in the P lattice. The singleton node set of procedural lattice represents a complete procedure of P itself. Logically, the procedural singleton node set is a life cycle. When two or more nodes are included, three node sets of the P lattice have to link together in sequential relationships. Time relevant sequences in finite numbers of connected nodes, must have distinguishable commence and end nodes that correspond to start and finish conditions respectively. In addition, all intermediate nodes provide operational capacities to deliver knowledge to consequent nodes. Consequently, three node sets of the P lattice are called: start, finish and operation, respectively. The relative properties of the cell model with other schemes are compared in Table 1. In the table, TM represents Theoretical Model that is used in KS applications. ST denotes Structural Theory that uses structured organisations to represent complex dependency among members. ES indicates Engineering Systems that provide mixed theories, experiences and skills with commercial system modelling tools for pragmatic applications especially in enterprise management, manufacturing and building industries, software and hardware systems, global communication networks, web and Internet environment. ES applies advanced TM methodologies plus business experiences and engineering kills to solve practical problems efficiently using system engineering methodologies in global business explorations.

From this comparison, it is clear that existing systems that are the most similar to the cell concept model come from enterprise modelling that provides all functionality for ten meta nodes from engineering practices. However, other theoretical models cannot support full functionality. This property indicates the potential capacity for applying the cell concept model from theoretic foundations to practical applications. Details of the concept cell have published [31] to represent further classifications, recursive constructions, non-simple cells and sample applications for knowledge construction systems.


**Table 1** Comparisons on different models

**Ten basic symbols: {D, T, I, E, C}, { P, S, O, F, L}**

D: Declarative; T: Tacit, I: Implicit, E: Explicit, C: Core;

P: Procedural; S: Start, O: Operation, F: Finish, L: Life cycle

**Three types of models: {ST, TM, ES}**

ST: Structural theory

TM: Theoretical model

ES: Engineering system

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Voting Theory for Two Parties Under Approval Rule**

**Jeffrey Zheng**

**Abstract** The Simple Ballot Model (SBM) and the Component Ballot Model (CBM)—are proposed for solving uncertainty in an election when two candidates gain the same number of votes under the approval rule. The SBM establishes a framework to support counting. In separating the two candidates, it is essential to extract additional information from dominantly valid votes. The CBM uses probability matrices, vectors and permutation group as components. A stable-voting mechanism under permutation invariant can be created to distinguish candidates. The result of the chapter establishes a voting authority to resolve uncertainty of two candidates under the approval rule.

**Keywords** Approval rule · Permutation invariant · Feature vector · Uncertainty Voting system

**JEL Classifications** D72 · D81 · C34 · C31

# **1 Introduction**

As a common practice in a modern democratic society, voting is a practical way to resolve a contest where each candidate seeks to gain maximal support from the electors. Approval voting is a voting procedure in which electors can vote for as many candidates as they wish. Each candidate approved of receives one vote and the candidate with the most votes wins. Approval voting, unlike more complicated ranking systems, is easier and simpler for electors to understand and use. This voting

J. Zheng (B)

© The Author(s) 2019

This work was supported by Yunnan Advanced Overseas Scholar Project and Yunnan National Science & Technology Foundation(2004F0009R).

Key Laboratory of Software Engineering of Yunnan, Yunnan University, Kunming, China e-mail: conjugatelogic@yahoo.com

J. Zheng (ed.), *Variant Construction from Theoretical Foundation to Applications*, https://doi.org/10.1007/978-981-13-2282-2\_10

method has been widely used today by various governments and organizations around world (including the use by the United Nations to elect the secretary-general).

To keep healthy economic and political progress in modern democracy societies, it is necessary to apply reliable and convenient voting methodologies and tools to ensure fairness, efficiency and transparency and to overcome paradoxes and difficulties in elections.

#### *1.1 Brief Review of Voting Systems*

We can find interesting voting-based models and practices in many ancient stories from Chinese literature to Roman and Greek history. Just before the French revolution in the French Academy, de Borda [1] and de Condorcet [2] proposed the *Borda rule* and the *Condorcet procedures*. They wanted to use new voting methods to resolve difficulties and unfair results under traditional plurality-based voting rules in elections for the Academy. In 1920s, Hotelling [3] investigated the *equilibrium* of spatial economic competition for two firms between location and price. During World War II, von Neumann and Morgenstern [4] developed *Theory of Games* using differential equations to investigate complicated competition behaviors. This theoretical foundation has a superior influence to develop analytical methodologies and tools from applying pre-designed strategic policies to predicting practical election outcomes. Under fairness conditions, Arrow [5] proved his famous *Impossibility Theorem* which claims that there is no single election procedure to fairly decide the outcome of an election involving more than three candidates. Various ideas, methods and technologies have emerged to resolve voting difficulties [6–9].

#### *1.2 Problems in the 2000 American Election*

The most debatable problem in the 2000 American election, the 2K-election, is that

Whether the machine-rejected ballots need to be manually recounted?

The practical solution of the 2K-election problem was finally decided by the nine judge's votes in the US Supreme Court on the lawsuits from the Florida Supreme Court.

This indicates that current voting theories and vote-counting models are all faults to be an authority resolving the problem.

Although the 2K-election is under the plurality rule, not under the approval rule, however the approval rule cannot guarantee to avoid the similar uncertainty when a large number of electors are involved. It is necessary to establish relevant theoretical structure to avoid possible problems in the future.

#### *1.3 Structure of the Chapter*

This chapter proposes two models constructing a voting theory to resolve the 2Kelection-like problems and other paradoxes in voting practices. Only one voting system under approval rule is concerned.

In Sect. 2, a Simple Ballot Model (SBM) is proposed. Using the SBM, the separable and uncertain conditions for the ballot papers are established. To show some practical strategies and relevant problems in current voting methodologies, four additional rules (reducing error probability, merging other candidate votes, re-election, and court decision) that are commonly used in practical voting processes are discussed.

In Sect. 2.8, the error margin for the 2K-election problem is analyzed. Through voting practice is not an accurate science, but the error margin of 0.233% in the event still cannot be acceptable as an accurate measure. Although almost 99.8% of the valid votes were counted, there is still no way of determining that who is the winner. Therefore, the attentions shifts to the 0.2% votes which were already deemed invalid. This problem highlights that the voting system needs to improve, and a method of extracting additional information from valid votes to separate the two candidates under uncertainty conditions becomes essential.

In Sect. 3, a new voting model—the Component Ballot Model (CBM)—is defined and constructed to provide the essential construction for extracting more information from votes for comparisons. Based on multiple feature matrices (similar to contingency tables in classical statistics), probability feature vectors and permutation invariant group and other advanced mathematical tools, multiple pair sets of feature index families for two candidates are constructed. This mechanism establishes a voting authority to make a decision for an election. After the mathematical definitions and constructions to feature matrix, feature vector, probability feature vector and feature index, the most important results are summarized in *Two*-*D Separable Proposition* and *Voting Authority Proposition*.

Taking into account only the valid votes, the election model will have intrinsic stability for the reliable results immediately after the election. Confusion, frustration and dissatisfaction as those experienced in the 2K-election can be avoided.

In the light of this research, some further research directions are suggested in Sect. 4.

#### **2 Simple Ballot Model**

#### *2.1 Key Words in Election*

Key words used in an election event can be defined as follows.


The Simple Ballot Model simulates the simplest case scenario of whole voting procedure based upon all ballots directly collected from an election under approval rule. In this scenario, one elector can only create one vote for as many candidates selected from a list of candidates.

#### *2.2 Definitions*

For an ideal **election** involving *n* (≥2) **candidates**, let *C* {*c*1, *c*2,..., *cn*} be a set of the selected candidates. A **ballot** *B c*1, *c*2,..., *cn* is a pre-designed form containing the list of candidates for whom the electors may vote.

A **vote** is a record of a ballot *B.* Let a vote denote v. It is valid if v v1, v2,...,v*n*, v*i*∈{0, 1},*i* ∈ [1, *n*], *n <sup>i</sup>*<sup>1</sup> v*<sup>i</sup>* > 0, otherwise if ∃v*<sup>i</sup> x* ∈/ {0, 1},*i* ∈ [1, *n*] or *n <sup>i</sup>*<sup>1</sup> v*<sup>i</sup>* 0 (null selection), then the vote v is invalid; where v*<sup>i</sup>* 1 indicates selected the candidate *ci* , v*<sup>i</sup>* 0 indicates not selected *ci* and v*<sup>i</sup> x* indicates invalid selection to *ci* . Normally a vote v has a value region from 0, 0,..., 0 to 1, 1,..., 1 … *x*, *x*,..., *x*.

An elector can only create one vote and there are a total number of *N* (*n*) votes in the election.

A poll *V* is a vote collection in which all votes can be arranged as an array with *N* entries:

$$V = (v(1), \dots, v(t), \dots, v(N)), \quad t \in [1, N]. \tag{2.1}$$

where v(*t*) denotes the vote of the *t*th elector. As each candidate has a number, let *k* ∈ v(*t*) denote the *t*th elector selected the *k*th candidate on the vote.

For example, *n* 6, *N* 8, a poll *V* is: *V* (v(1),...,v(*t*),...,v(8)), *t* ∈ [1, 8]

$$\begin{aligned} v(1) &= \langle 0, 0, 1, 1, 1, 0, 0 \rangle, v(2) = \langle 0, 1, 0, 1, 0, 0 \rangle, v(3) = \langle 0, 1, 0, 1, 1, 1, 0 \rangle, \\ v(4) &= \langle 1, 0, 0, 1, 1, 0 \rangle, v(5) = \langle 0, 1, 0, 1, 0, 0 \rangle, v(6) = \langle 0, 0, 1, 1, 1, 1, 0 \rangle, \\ v(7) &= \langle 0, 0, 1, 0, 0, 0 \rangle, v(8) = \langle 0, 0, 0, 0, 0, 0, 0 \rangle \end{aligned}$$

In this poll, {v(1), v(2), v(3), v(5), v(6), v(7)} are valid votes (v3(1) v4(1) 1 indicates the 1-st vote selected the third and forth candidates). In addition, v(4) contains an uncertain selection (v3(4) *x*) and v(8) is a null selection, both votes are invalid.

Let *V*<sup>0</sup> denote the invalid-poll in the election. It collects all invalid votes from the poll *V.* Let *Vc* denote a valid sub-poll in the election. Both sub-polls *Vc* and *V*<sup>0</sup> partition the poll *V*. i.e.

$$V = Vc \cup V\_0.$$

Let *Vk* denote a sub-poll in the election. For any *k* ∈ [1, *n*], *Vk* collects all valid votes from the poll *V* for the *k*th candidate.

$$V\_k = \{ v(t) | v\_k(t) = 1, k \in [1, n], t \in [1, N], v(t) \in Vc \}.$$

Let *V*˜ denote a poll vector,

$$\tilde{V} = (V\_0, V\_1, \dots, V\_k, \dots, V\_n), k \in [1, n]. \tag{2.2}$$

A SBM is a collection of a ballot form, all votes, poll and poll components for an election.

$$\mathcal{S}BM = \left(B \, \middle| \, V; \dot{V}\right) \tag{2.3}$$

Let *NV c* denote the number of votes in the valid poll *V c*, *NV c* |*V c*|. Let *Nk* denote the number of votes in the valid poll *Vk* , *Nk* |*Vk* |, *k* ∈ [1, *n*] and *N*<sup>0</sup> denote the number of votes in the invalid poll *V*0.

The total number of votes in an election, *N*, is equal to the sum of the number of the valid votes *NV c* plus the number of all invalid votes *N*0, i.e.

$$N = N\_{Vc} + N\_0. \tag{2.4}$$

Let *pV c* |*V c*|/|*V*| *NV c*/*N* denote a measure of the valid votes.

For any poll vector *V*˜ , let *pk* |*Vk* |/|*V*| *Nk*/*N*, 1 ≤ *k* ≤ *n* denote a measure of the *k*th candidate and *p*<sup>0</sup> |*V*0| |*V*| *N*<sup>0</sup> *N* denote the measure of the invalid votes.

Under the approval rule, there are many overlaps among different sub-polls. Considering two candidate sub-polls and their common parts, if ∃*k*,*l* ∈ [1, *n*], *Vk* , *Vl* ⊆ *V c*, *Vk* <sup>∩</sup> *Vl* <sup>∅</sup>, then

$$|V\_k \cup V\_l| = |V\_k| + |V\_l| - |V\_k \cap V\_l| \tag{2.5}$$

In general, we have

$$|V\_k \cup V\_l| \le |V\_k| + |V\_l|\tag{2.6}$$

Let denote a frequency vector,

$$\Psi = (p\_0, p\_1, \dots, p\_k, \dots, p\_n), \quad k \in \{1, n\} \tag{2.7}$$

#### *2.3 One-Dimensional Feature Distribution*

The frequency vector corresponds to a density distribution. There are equations as follows.

$$1 = p\_{Vc} + p\_0;\tag{2.8}$$

$$1 \ge p\_k \ge 0, \quad k \in [1, n]. \tag{2.9}$$

Because there is no further partition among sub-polls, the vector is composed of a one-Dimension frequency feature histogram.

Considering inequalities (2.6), (2.8) and (2.9), there is an inequality.

$$1 \le \sum\_{k=0}^{n} p\_k \le n. \tag{2.10}$$

If sub-polls partition the poll, then there is 1 *n <sup>k</sup>*<sup>0</sup> *pk* . In the worst case scenario, if all valid votes select all candidates without invalid votes, then

$$p\_0 = 0, \, p\_1 = \dots = p\_n = 1, \quad \sum\_{k=0}^n p\_k = n$$

#### *2.4 Separable Condition*

When ∃*i*, *j* ∈ [1, *n*], *pi*, *pj* > *p*0, a decision between the candidates *i* and *j* can be made if and only if

$$\left|p\_i - p\_j\right| \succ p\_0 \tag{2.11}$$

This is the separable condition.

#### *2.5 Uncertain Condition*

However, there will be intrinsic difficulties to make a decision between the candidates *i* and *j* simply from their measures *pi* and *pj* , if

$$\left|p\_i - p\_j\right| \le p\_0 \tag{2.12}$$

This is the uncertain condition.

Under the uncertain condition, there are no simple solutions to distinguish signals clearly between *pi* and *pj* under the interference of *p*0.

#### *2.6 Balanced Opposites*

It is extremely hard to make any decision when both candidates gain the same number of votes in an election. However, for any equilibrium dynamic system involving two balanced opposites in competition, the most probable trends are *pj pi* . In general, more complicated feedback mechanisms are involved and balanced events occur more frequently [10, 11].

#### *2.7 Four Additional Policies*

To resolve conflicts in an election, four additional policies may be useful: reducing error probability (*p*<sup>0</sup> → 0), merging other candidate votes (*Vi*∪*Vl* → *Vi* or *Vj*∪*Vl* → *Vj* ; *i*, *j*,*l* ∈ [1, *n*]), re-election (new *pi*, *pj*) and court decision.

The reducing error probability policy works well in certain conditions involving only a small number of electors. Using various controlled methods, e.g., the total number of seats in Parliament being an odd number or some additional votes allowed by Parliament Leaders, the worst case scenario where both candidates hold equal votes without a decision can be eliminated. However, when an election involves a large number of electors like sizes of the 2K-election, the voting system becomes a naturally complex dynamic system and there is no way to make the error margin (*p*<sup>0</sup> → 0) negligible.

The merging other votes policy works in simple conditions at a single location. To combine votes for candidates from multiple locations under approval rule would be more difficult than under plurality rules since there are many overlaps among subpolls. There is no guarantee to ensure the policy work. In the best cases, old difficulties may be temporarily solved, but new similar uncertainties could immediately emerge.

From a complex-dynamic system, re-election is as same as the original election. Therefore, the re-election policy cannot provide improved separable property between two candidates.

If other solutions can not be found by timing or other issues, then it is feasible to use Courts to make decision. The court decision policy uses Courts to make decision, it results in efficient decision-making but breaks down the election procedure and it may loose fairness, transparency, self-determination and other advantages of the election process.

#### *2.8 How Accurate Is Accurate?*

It is well known that all measurements in physics and in all exact science are inaccurate in some degree. So, what then is sufficient to be deemed accurate for an election? Can we accept a 10% margin of error to be accurate? What about 1% or even 0.1%?

In real life, an error margin of 1% would be highly commendable and one of 0.1% would be considered highly accurate.

Although, voting and polling were not meant to be an exact science, polls and other pre-election statistics had error margin of almost 5–10%. Yet in the actual election, the margin of error was less in the disputed counties, e.g. Miami-Dada and Palm Beach, only 14,000 votes from a total number of six million votes were rejected. The margin of error was only 0.233%. Usually, this would be deemed a negligible number, as almost 99.8% of votes were valid. However, it was not enough to separate the two candidates, this margin would have to reduce the rejected votes from 14,000 to 100. In the condition, at least an error margin of 0.00016666% is required. This is highly improbable due to the cost, time and other factors.

#### *2.9 Shifting Attentions from Invalid Votes to Valid Votes*

Almost 99.8% votes are valid. This indicates that in order to determine who will be the winner under the uncertain condition, it is necessary to fetch additional information to determine a victor from valid votes instead of reducing the error margin by handling invalid votes. The total number of votes is far greater than the number of candidates. This makes possible to extract additional information using crossclassification methods based on contingency table-like techniques among multiple categories. The cross-classified technique is a powerful toolkit in modern statistics [12, 13, 14, 15].

Under additional categories such as location, age group and sex, valid votes will be categorized as two-dimensional classified feature distributions in respective contingency tables. Such spatial or histogram-like feature distributions provide invaluable information to support improving separable properties between two uncertain candidates. To represent this idea, a new model is proposed in next chapter.

#### **3 Component Ballot Model**

To overcome the intrinsic complexities and uncertain problems in approval voting practices, a new model—the Component Ballot Model**—**is proposed in this chapter to use multiple variables on a ballot for a better description and an easier comparison.

#### *3.1 Definitions*

To be consistent with the previous notation, similar symbols (ballot paper) are used. However, the contents of the ballot paper and other notations will be compounded into vector forms.

Let *C* {*C*1,*C*2,...,*Cm*} be a set of the selected conditions. The *i*-th item contains *ni* distinct values for selections, *Ci ci* <sup>1</sup>,..., *c<sup>i</sup> <sup>j</sup>*,..., *c<sup>i</sup> ni* , *j* ∈ [1, *ni*],*i* ∈ [1, *m*].

A **ballot** *B* (or a **component ballot**) is a vector composed of *m* items:

$$B = \begin{pmatrix} C\_1 \\ \dots \\ C\_i \\ \dots \\ C\_m \end{pmatrix} = \begin{pmatrix} \langle c\_1^1, \dots, c\_{n\_i}^1 \rangle \\ \dots \\ \left\langle c\_1^i, \dots, c\_j^i, \dots, c\_{n\_i}^i \right\rangle \\ \dots \\ \left\langle c\_1^m, \dots, c\_{n\_m}^m \right\rangle \end{pmatrix}, \quad j \in [1, n\_i], i \in [1, m] \tag{3.1}$$

Component items in a ballot provide additional information about elector to the paper such as sex, voting time, location, age group, and minority, living area, social security and employ situations.

For example, the first item contains 10 candidates, the second item presents 100,000 locations, the third item has 3 sex groups (male, female, neutral), the forth item contains 150 age groups, and the fifth item indicates 1010 social security number. Under above conditions, a ballot paper could be

$$B = \begin{pmatrix} C\_1 \\ C\_2 \\ C\_2 \\ C\_4 \\ C\_5 \end{pmatrix} = \begin{pmatrix} \langle c\_1^1, \dots, c\_{10}^1 \rangle \\ \langle c\_1^2, \dots, c\_{100000}^2 \rangle \\ \langle c\_1^3, c\_2^3, c\_3^3 \rangle \\ \langle c\_1^4, \dots, c\_{150}^4 \rangle \\ \langle c\_1^5, \dots, c\_{100}^5 \rangle \end{pmatrix},$$

$$m = 5, n\_1 = 10, n\_2 = 100000, n\_3 = 3, n\_4 = 150, n\_5 = 10^{10}.$$

A **vote** *v* (or a component vote) is a record of a component ballot *B* for which at least one value for each *m* items has been assigned:

$$\boldsymbol{v} = \begin{pmatrix} \boldsymbol{v}^{1} \\ \cdots \\ \boldsymbol{v}^{i} \\ \cdots \\ \boldsymbol{v}^{m} \end{pmatrix} = \begin{pmatrix} \langle \boldsymbol{v}^{1}\_{1}, \ldots, \boldsymbol{v}^{1}\_{n\_{1}} \rangle \\ \cdots \\ \langle \boldsymbol{v}^{i}\_{1}, \ldots, \boldsymbol{v}^{i}\_{l}, \ldots, \boldsymbol{v}^{i}\_{n\_{l}} \rangle \\ \vdots \\ \langle \boldsymbol{v}^{m}\_{1}, \ldots, \boldsymbol{v}^{m}\_{n\_{m}} \rangle \end{pmatrix}, \quad \boldsymbol{v}^{i}\_{l} \in \{0, 1, \boldsymbol{x}\}, l \in [1, n\_{l}], i \in [1, m]. \tag{3.2}$$

where *ni* is the upper limit of v*<sup>i</sup>* ; v*<sup>i</sup> <sup>l</sup>* 1 (or 0) means *<sup>c</sup><sup>i</sup> <sup>l</sup>* candidate selected (or not selected), v*<sup>i</sup> <sup>l</sup> <sup>x</sup>* indicates *<sup>c</sup><sup>i</sup> <sup>l</sup>* being an invalid value.

More items are provided for each ballot to include more information. Further distinctions of their valid regions are necessary. If for a vote *v*, the first item satisfies *i* 1, *ni <sup>l</sup>*<sup>1</sup> <sup>v</sup><sup>1</sup> *<sup>l</sup>* ≥ 1(more than one values selected) and all additional items satisfy v*<sup>i</sup> <sup>l</sup>* ∈ {0, 1},*l* ∈ [1, *ni*],*i* ∈ [2, *m*], *ni <sup>l</sup>*<sup>1</sup> <sup>v</sup>*<sup>i</sup> <sup>l</sup>* 1(one and only one value selected), then the vote *<sup>v</sup>* is a **valid vote**. However, if <sup>∃</sup>*i*,*l*, v*<sup>i</sup> <sup>l</sup>* ∈ {*x*},*i* ∈ [1, *m*],*l* ∈ [1, *ni*] or there is one v*<sup>i</sup>* in additional items assigned multiple values, <sup>∃</sup>*i*, v*<sup>i</sup> <sup>l</sup>* ∈ {0, 1}, *ni <sup>l</sup>*<sup>1</sup> <sup>v</sup>*<sup>i</sup> <sup>l</sup>* > 1,*l* ∈ [1, *ni*],*i* ∈ [2, *m*] then *v* is an **invalid vote**.

Normally the valid first item in a vote has a value region from 0, 0,..., 0, 1 to 1, 1,..., 1. A total number of 2*<sup>n</sup>*<sup>1</sup> − 1 combinations are valid to allow one, two or more candidates selected. However, for other additional items there is one and only one value selected from 0, 0,..., 0, 1 to 1, 0,..., 0, 0. There are only *ni*,*i* ∈ [2, *m*] selections allowed.

Additional information for electors may been accessed from existing election databases somewhere, there is no any technical difficulty to merge them to be a compound vote automatically using modern information technology.

There are enough rooms for an elector with various parameters on a vote and a total number of *N* electors in voting.

A **poll** *V* is a vote collection in which all votes can be arranged as an array with *N* entries:

$$V = (v(1), \dots, v(t), \dots, v(N)), \quad t \in [1, N]. \tag{3.3}$$

Considering each vote has *m* items, a poll *V* can be represented as a 2D *m*×*N* array.

$$V = (v(1), \ldots, v(t), \ldots, v(N))$$

$$= \left( \begin{pmatrix} v^1(1) \\ \cdots \\ v^i(1) \\ \cdots \\ \cdots \\ v^m(1) \end{pmatrix}, \ldots, \begin{pmatrix} v^1(t) \\ \cdots \\ v^i(t) \\ \cdots \\ \cdots \\ v^m(t) \end{pmatrix}, \ldots, \begin{pmatrix} v^1(N) \\ \cdots \\ v^i(N) \\ \cdots \\ \cdots \\ v^m(N) \end{pmatrix} \right) \quad t \in [1, N], i \in [1, m]. \tag{3.4}$$

#### *3.2 Feature Partition*

Let *V c* denote a valid poll and *V*<sup>0</sup> denote an invalid poll, *V c* and *V*<sup>0</sup> partition the poll *V* i.e.

$$\begin{aligned} Vc &= \{ \forall v | v \text{ is a valid vote, } v \in V \}; \\ V\_0 &= \{ \forall v | v \notin Vc, \, v \in V \}; \\ V &= Vc \cup V\_0. \end{aligned} \tag{3.5}$$

Let *V<sup>i</sup>* denote a sub-poll in the election. For any *i* ∈ [1, *m*], *V<sup>i</sup>* collects all valid votes of the poll *V* for the *i*th item.

$$V^i = \left\{ \forall v(t) | v(t) \in Vc, \, v\_l^i(t) \in \{0, 1\}, \, \sum\_{l=1}^{n\_i} v\_l^i(t) \ge 1, \, \\\\ l \in \{1, n\_i\}, t \in [1, N], \, i \in [1, m] \right\} \tag{3.6}$$

**Zero-D Feature Lemma** All *Vi m <sup>i</sup>*<sup>1</sup> sub-polls contain the same votes as in the poll *Vc*:

$$Vc = V^1 = V^2 = \dots = V^i = \dots = V^m\tag{3.7}$$

*Proof* Using Eqs. (3.5) and (3.6), a valid vote contains at least one valid value in each category. No difference exists to project all valid votes as one group. -

Let *V<sup>i</sup> <sup>k</sup>* denote a sub-poll in the election. For any *<sup>i</sup>* <sup>∈</sup> [1, *<sup>m</sup>*], *<sup>V</sup><sup>i</sup> <sup>k</sup>* collects all valid votes of the poll *Vc* for the *i*th item in a special location *k*.

$$V\_k^i = \left\{ \forall v(t) | v(t) \in Vc, v\_k^i(t) = 1, t \in [1, N], i \in [1, m], k \in [1, n\_i] \right\} \tag{5.8}$$

**One-D Feature Lemma** All *Vi k <sup>k</sup>*∈[1,*ni*] sub-polls dissect a sub poll *<sup>V</sup><sup>i</sup>* :

$$V^i = \bigcup\_{k=1}^{n\_i} V\_k^i \tag{3.9}$$

*Proof* By Eqs. (3.5)–(3.8), each vote has at least an identified value. To collect all votes with the value, we have the result. -

**One-D Feature Corollary** If each vote contains only one value in the category item, then all sub-polls *Vi k <sup>k</sup>*∈[1,*ni*] partition a sub poll *<sup>V</sup><sup>i</sup>* :

$$|V^i| = \sum\_{k=1}^{n\_i} |V\_k^i|\tag{3.10}$$

*Proof* By Eq. (3.9), each vote has an identified value. There is no overlap among possible sub-polls in relation to the category item. -

It can be noticed that only candidate category does not satisfy one-D feature corollary under approval voting rule. Other additional categories satisfied the condition.

Different from the Zero-D feature lemma, the One-D feature corollary provides non-trivial partition of the votes into multiple sub polls.

Let *V*<sup>0</sup> denote an invalid-poll in the election. It collects all invalid votes of the poll *V.*

$$V^0 = \{ \forall v(t) | v(t) \notin Vc, t \in [1, N] \}\tag{3.11}$$

Since there is no any further distinction for votes in *V*0, all votes in this poll correspond to discarded votes.

Let *Vi*,*<sup>j</sup> <sup>k</sup>*,*<sup>l</sup>* denote a sub poll. It can be described as

$$V\_{k,l}^{i,j} = \left| \forall v(t) | v(t) \in Vc, \, v\_k^i(t) = 1, \, v\_l^j(t) = 1;$$

$$t \in [1, N], i, j \in [1, m], k \in [1, n\_i], l \in [1, n\_j]. \right\} \tag{3.12}$$

For any *i*, *j* ∈ [1, *m*], *k* ∈ [1, *ni*],*l* ∈ 1, *n <sup>j</sup>* , collected votes of *Vi*,*<sup>j</sup> <sup>k</sup>*,*<sup>l</sup>* are the same as the votes in *V <sup>j</sup>*,*<sup>i</sup> <sup>l</sup>*,*<sup>k</sup>* .

If *<sup>l</sup> <sup>k</sup>*, then votes in *<sup>V</sup>i*,*<sup>j</sup> <sup>k</sup>*,*<sup>l</sup>* are different from the votes in *<sup>V</sup> <sup>j</sup>*,*<sup>i</sup> <sup>k</sup>*,*<sup>l</sup>* .

$$\mathbf{Two-D \gets} \mathbf{Feature\\_Lemma\\_All\ votes in\ } \left\{ V\_{k,l}^{i,j} \right\}\_{k \in \{1, n\_i\}, l \in \left[1, n\_j\right]} \text{discrete either } V\_k^i \text{ or } V\_l^j. \text{ The } k \text{ is}\left(l\right) = \sum\_{l=1}^{n\_j} V\_{k,l}^{i,j};\tag{3.13a}$$

$$V\_k^i = \bigcup\_{l=1}^{n\_j} V\_{k,l}^{i,j};\tag{3.13a}$$

or

$$V\_l^j = \bigcup\_{k=1}^{n\_l} V\_{k,l}^{i,j}.\tag{3.13b}$$

*Proof* By Eq. (3.12) and one-D feature lemma, each vote in the sub-polls has other identified values. To collect all votes with the value in relevant sub-polls, we have the result. -

**Two-D Feature Corollary** If a valid vote contains a single value in the selected category item, then all votes in *V<sup>i</sup>*,*<sup>j</sup> k*,*l <sup>k</sup>*∈[1,*ni*],*l*∈[1,*<sup>n</sup> <sup>j</sup>*] partition either *V<sup>i</sup> <sup>k</sup>* or *<sup>V</sup> <sup>j</sup> <sup>l</sup>* . For *j* category,

$$\left|V\_k^i\right| = \sum\_{l=1}^{n\_j} \left|V\_{k,l}^{i,j}\right|;\tag{3.13c}$$

Or for *i* category,

$$\left| V\_l^j \right| = \sum\_{k=1}^{n\_l} \left| V\_{k,l}^{i,j} \right|. \tag{3.13d}$$

*Proof* When each vote in the sub-polls has only a single value in relation to the selected category item, the sub-polls partition the selected poll. -

Under this construction, all votes in *V<sup>i</sup>*,*<sup>j</sup> k*,*l <sup>i</sup>*,*j*∈[1,*m*] *k*∈[1,*ni*],*l*∈[1,*n <sup>j</sup>*] dissect the valid poll *Vc*. When single value condition satisfied, sub-polls can partition the valid poll.

#### *3.3 Feature Matrix Representation*

For a given pair *i*, *j* ∈ [1, *m*], let *k* corresponding to row number and *l* corresponding to column number, for a given *Vi*,*<sup>j</sup> k*,*l k*∈[1,*ni*],*l*∈[1,*n <sup>j</sup>*] sub polls, there is a unique feature matrix representation.

#### **3.3.1 Feature Matrix**

Let *Vi*,*<sup>j</sup>* denote a feature matrix,

$$V^{i,j} = \begin{pmatrix} V\_{1,1}^{i,j} & \dots & V\_{1,l}^{i,j} & \dots & V\_{1,n\_j}^{i,j} \\ \dots & & \dots & \dots \\ V\_{k,1}^{i,j} & \dots & V\_{k,l}^{i,j} & \dots & V\_{k,n\_j}^{i,j} \\ \dots & & \dots & & \dots \\ V\_{n,1}^{i,j} & \dots & V\_{n,l}^{i,j} & \dots & V\_{n,n\_j}^{i,j} \end{pmatrix}, \quad k \in [1, n\_i], l \in [1, n\_j]. \tag{3.14}$$

Using a statistical language, a feature matrix *V <sup>i</sup>*,*<sup>j</sup>* may correspond to a contingency table based on cross-classified categorical data under two selected categories [13, 16, 17]. Each element of the matrix collects a sub-set of votes in a respective crosscategorical meaning.

#### **3.3.2 Feature Matrix Set**

For a given *V<sup>i</sup>*,*<sup>j</sup> k*,*l <sup>i</sup>*,*j*∈[1,*m*] *k*∈[1,*ni*],*l*∈[1,*n <sup>j</sup>*] , there are a total number of 2 \* *m* 2 *m* \* (*m* −1) distinction feature matrixes. It is composed of a matrix set *VS,*

$$VS = \left\{ V^{i,j} \middle| i, j \in [1, m] \right\}. \tag{3.15}$$

For a given pair *<sup>i</sup> <sup>j</sup>*,*i*, *<sup>j</sup>* <sup>∈</sup> [1, *<sup>m</sup>*] in the set, each *V<sup>i</sup>*,*<sup>j</sup> k*,*l k*∈[1,*ni*],*l*∈[1,*n <sup>j</sup>*] or *V <sup>j</sup>*,*<sup>i</sup> k*,*l k*∈[1,*n <sup>j</sup>*],*l*∈[1,*ni*] corresponds to a unique matrix or its translation matrix. However a given pair *i j*,*i*, *j* ∈ [1, *m*], the matrix is equal to its translation matrix. So there

are a total of *m* \* *m* − *m* different matrix representations. For a fixed item (e.g. *i* 1) as the first index, there are a total number of *m m* 1 different matrices in the system to record different relations among *Vi*,*<sup>j</sup> k*,*l i*,*j*∈[1,*m*] *k*∈[1,*ni*],*l*∈[1,*n <sup>j</sup>*] sub polls.

Let *V SC*(*i*) denotes the matrix set with first index fixed at *i*,

$$VSC(i) = \left\{ V^{i,j} \, | \, j \in [1, m] \right\}.\tag{3.16}$$

Selecting one category for both row and column values, for a given *V SC*(*i*), if *Vi*,*<sup>i</sup> <sup>k</sup>*,*<sup>l</sup>* ∈ *Vi*,*<sup>i</sup>* in *V SC*(*i*), a vote in the *i* th category contains only one valid value, then *Vi*,*<sup>i</sup> <sup>k</sup>*,*<sup>l</sup>* can be determined as following.

$$V\_{k,l}^{i,i} = \begin{cases} \mathcal{Q}, \; if \; k \neq l;\\ V\_k^i, \; if \; k = l; \end{cases} \quad k, l \in [1, n\_i], i \in [1, m]. \tag{3.17a}$$

In this case, the matrix *Vi*,*<sup>i</sup>* is a diagonal matrix.

However, if *Vi*,*<sup>i</sup> <sup>k</sup>*,*<sup>l</sup>* ∈ *Vi*,*<sup>i</sup>* in *V SC*(*i*), a vote in the *i* th category contains multiple distinguishable values, then *Vi*,*<sup>i</sup> k*,*l* provides cross-classified sub-polls.

$$V\_{k,l}^{i,i} = V\_{l,k}^{i,i}, \quad V\_k^i = \bigcup\_{l=1}^{n\_i} V\_{k,l}^{i,i} = \bigcup\_{l=1}^{n\_i} V\_{l,k}^{i,i}, \quad k, l \in [1, n\_i], i \in [1, m]. \tag{3.17b}$$

In this case, the matrix *V<sup>i</sup>*,*<sup>i</sup>* is a symmetric matrix.

For a given *V SC*(*i*), *V<sup>i</sup>*,*<sup>j</sup> <sup>k</sup>*,*<sup>l</sup>* ∈ *V<sup>i</sup>*, *<sup>j</sup>* in *V SC*(*i*), following equation is true.

$$V\_k^i = \bigcup\_{l=1}^{n\_j} V\_{k,l}^{i,j} \quad k \in [1, n\_i], l \in [1, n\_j], i, j \in [1, m]. \tag{3.18}$$

#### **3.3.3 Probability Feature Matrix**

Let *P<sup>i</sup>*,*<sup>j</sup>* denote a probability feature matrix corresponding to the matrix *P<sup>i</sup>*,*<sup>j</sup>* and *p i*,*j k*,*l* denote its element set, for any *p i*,*j <sup>k</sup>*,*<sup>l</sup>* ∈ *P<sup>i</sup>*,*<sup>j</sup>* ,

Voting Theory for Two Parties Under Approval Rule 183

$$p\_{k,l}^{i,j} = \begin{cases} |V\_{k,l}^{i,j}| / |V\_k^i|, \; V\_k^i \neq \mathcal{Q};\\ 0, \qquad V\_k^i = \mathcal{Q}. \end{cases} \tag{3.19}$$

$$P^{i,j} = \begin{pmatrix} p\_{1,1}^{i,j} & \dots & p\_{1,l}^{i,j} & \dots & p\_{1,n\_j}^{i,j} \\ \dots & & \dots & & \dots \\ p\_{k,1}^{i,j} & \dots & p\_{k,l}^{i,j} & \dots & p\_{k,n\_j}^{i,j} \\ \dots & & \dots & & \dots \\ p\_{n,1}^{i,j} & \dots & p\_{n,l}^{i,j} & \dots & p\_{n,n\_j}^{i,j} \end{pmatrix}, \quad k \in [1, n\_i], l \in [1, n\_j] \tag{3.20}$$

For example, *n*<sup>1</sup> 6, *n*<sup>2</sup> 4, a probability feature matrix can be as follows:

$$P^{1,2} = \begin{pmatrix} 0.04 & 0.26 & 0.1 & 0.6\\ 0.42 & 0.2 & 0.3 & 0.18\\ 0.14 & 0.21 & 0.42 & 0.23\\ 0 & 0 & 0 & 0\\ 0.008 & 0.022 & 0.75 & 0.22\\ 0.33 & 0.01 & 0.23 & 0.43 \end{pmatrix}.\tag{3.21}$$

#### *3.4 Probability Feature Vector*

For any *Pi*,*<sup>j</sup>* , only at most *ni* row vectors in the matrix need to satisfy Eq. (3.22).

$$1 = \sum\_{l=1}^{n\_j} p\_{k,l}^{i,j}, \quad k \in [1, n\_i], l \in [1, n\_j], i, j \in [1, m]. \tag{3.22}$$

The Eq. (3.22) can be established from Eq. (3.13c), if the column items partition the sub-polls for the given row.

Because there is not any restriction among the columns of the probability feature matrix *P<sup>i</sup>*,*<sup>j</sup>* , such properties make flexible select different categories partitioning a given vote set *p i*,*j k*,*l* into multiple distributions in larger selection spaces to satisfy complicated dynamic system requirements.

For a given *P<sup>i</sup>*,*<sup>j</sup>* , if the *i*th item is a categorical index of candidates, then any candidate *k* ∈ [1, *ni*] has a probability feature vector corresponding to its probability densities relevant to item *j* and denoted by *i*,*j k* .

$$\Psi\_k^{i,j} = \left( p\_{k,1}^{i,j}, \dots, p\_{k,l}^{i,j}, \dots, p\_{k,n\_j}^{i,j} \right), k \in [1, n\_i], l \in [1, n\_j], i, j \in [1, m] \quad (3.23)$$

#### *3.5 Differences Between Two Probability Vectors*

Let *Vi l <sup>l</sup>*∈[1,*ni*] sub-polls denote a vector *<sup>V</sup>*˜ *<sup>i</sup> V*0, *V<sup>i</sup>* <sup>1</sup> ,..., *V<sup>i</sup> <sup>l</sup>* ,..., *V<sup>i</sup> ni* , *l* ∈ [1, *ni*], this vote vector corresponds to a probability vector -"*<sup>i</sup> <sup>p</sup>*˜<sup>0</sup>, *<sup>p</sup>*˜*<sup>i</sup>* <sup>1</sup>,..., *<sup>p</sup>*˜*<sup>i</sup> <sup>l</sup>*,..., *p*˜*<sup>i</sup> ni* , *l* ∈ [1, *ni*], let

$$\tilde{p}\_l^i = |V\_l^i| / (|V^i| + |V\_0|) = N\_l / N, l \in [1, n\_i] \tag{3.24}$$

and

$$\tilde{p}^0 = |V\_0| / \left( |V^i| + |V\_0| \right) = N\_0 / N, i \in \{1, m\}. \tag{3.25}$$

Let *Vi l <sup>l</sup>*∈[1,*ni*] sub-polls denote a vector *<sup>V</sup><sup>i</sup> Vi* <sup>1</sup> ,..., *V<sup>i</sup> <sup>l</sup>* ,..., *V<sup>i</sup> ni* , *l* ∈ [1, *ni*] and

$$p\_l^i = |V\_l^i| / |V^i| = N\_l / (N - N\_0), l \in \{1, n\_i\} \text{and } i \in \{1, m\}. \tag{3.26}$$

A vector *V<sup>i</sup>* is corresponding to a probability vector *i* ,

$$\Psi^{i} = (p\_1^i, \dots, p\_l^i, \dots, p\_{n\_l}^i), l \in \{1, n\_i\}. \tag{3.27}$$

If the *i*th item of a vote indicates an ordinal number of candidates in an election, a probability vector *<sup>i</sup>* is a special case of a linear spectral distribution.

For any *<sup>l</sup>*th candidate, if 1 ≥ ˜*p<sup>i</sup> <sup>l</sup>* >> *<sup>p</sup>*˜<sup>0</sup> <sup>≥</sup> 0, then *<sup>p</sup>*˜*<sup>i</sup> <sup>l</sup>* ∼ *<sup>p</sup><sup>i</sup> l* .

Considering the difference between the two probability measures,

$$\begin{split}p\_l^i - \tilde{p}\_l^i &= N\_l / (N - N\_0) - N\_l / N\\ &= N\_l N\_0 / N (N - N\_0) \\ &= N\_l / (N - N\_0) \times N\_0 / N \\ &= p\_l^i \times \tilde{p}^0 \ge 0 \to 0. \end{split} \tag{3.28}$$

Equation (3.28) indicates that the probability measure of invalid votes is small compared with the candidate measures. There is no significant difference for both probability measures *<sup>p</sup>*˜*<sup>i</sup> <sup>l</sup>* and *p<sup>i</sup> <sup>l</sup>* for a candidate in two probability vectors *<sup>i</sup>* and *i* respectively.

If any *l*th and *g*th candidates gain a similar number of votes in an election to satisfy the uncertain condition, then the difference between both probability measures *p<sup>i</sup> l* and *p<sup>i</sup> <sup>g</sup>* are restricted by the uncertain condition too.

Considering probability measure difference under uncertain condition, their difference is

Voting Theory for Two Parties Under Approval Rule 185

$$\begin{split} |\tilde{p}\_l^i - \tilde{p}\_\mathcal{g}^i| &= |\tilde{p}\_l^i - p\_l^i + p\_l^i - \tilde{p}\_\mathcal{g}^i + p\_\mathcal{g}^i - p\_\mathcal{g}^i| \\ &= |p\_l^i - p\_\mathcal{g}^i - (\tilde{p}\_l^i - p\_l^i) - (\tilde{p}\_\mathcal{g}^i - p\_\mathcal{g}^i)| \\ &= |p\_l^i - p\_\mathcal{g}^i + (p\_l^i - \tilde{p}\_l^i) + (p\_\mathcal{g}^i - \tilde{p}\_\mathcal{g}^i)| \end{split} \tag{3.29}$$

$$\stackrel{\rightarrow}{r.:} \left( p\_l^i - \tilde{p}\_l^i \right) + \left( p\_\mathbf{g}^i - \tilde{p}\_\mathbf{g}^i \right) = \left( p\_l^i + p\_\mathbf{g}^i \right) \times \tilde{p}^0 \ge 0,\tag{3.30}$$

$$\begin{split} \left| \left| p\_l^i - p\_\mathbf{g}^i \right| + \left( p\_l^i + p\_\mathbf{g}^i \right) \times \tilde{p}^0 \le \left| \tilde{p}\_l^i - \tilde{p}\_\mathbf{g}^i \right| + \left( p\_l^i + p\_\mathbf{g}^i \right) \times \tilde{p}^0 \le \tilde{p}^0 + \left( p\_l^i + p\_\mathbf{g}^i \right) \times \tilde{p}^0 \\ \vdots \quad \left| \left| p\_l^i - p\_\mathbf{g}^i \right| \le 3 \times \tilde{p}^0. \end{split} \tag{3.31}$$

Equation (3.31) indicates that the new probability vector does not solve the uncertain problem. To overcome the difficulty, other techniques need to be employed.

#### *3.6 Permutation Invariant Group*

For any *i*,*j <sup>k</sup>* , a permutation invariant group -(*i*, *j*|*k*) can be constructed to collect vectors using all elements in ψ*<sup>i</sup>*,*<sup>j</sup> <sup>k</sup>* as constructors of possible permutations.

#### **3.6.1 Feature Index and Permutation Invariant Family**

For a vector ∈ -(*i*, *j*|*k*), if it is feasible to define a numeric measure (or feature index) and all vectors ∀ ∈ -(*i*, *j*|*k*) have the same index, then the feature index λ is an **invariant** of -(*i*, *j*|*k*).

For ∀ ∈ -(*i*, *j*|*k*),

 ∃λ|λ() λ() *c*, ; , ∈ -(*i*, *j*|*k*), *k* ∈ [1, *ni*],*l* ∈ 1, *n <sup>j</sup>* ,*i*, *j* ∈ [1, *m*] (3.32)

#### **3.6.2 Polynomial Feature Index Family**

For any probability vector - *p*1,..., *pj*,..., *pm* with *m* items and ∃*k* ∈ [1, *m*], *pk* > 0 a family of polynomial indexes {λ*n*} is defined by Eqs. (3.33)–(3.36).

$$\lambda\_0(\Psi) = \sum\_{l=1}^{m} \left( p\_l \right)^0 = m;\tag{3.33}$$

$$\lambda\_1(\Psi) = \sum\_{l=1}^{m} (p\_l)^1 = 1;\tag{3.34}$$

$$\lambda\_2(\Psi) = \sum\_{l=1}^{m} (p\_l)^2;\tag{3.35}$$

λ*n*(-) *<sup>m</sup> l*1 (*pl*) *<sup>n</sup>*, *<sup>n</sup>* <sup>≥</sup> <sup>0</sup>. (3.36)

For example, using the sample probability matrix *P*1,<sup>2</sup> of Eq. (3.21), its polynomial indexes {λ*n*} are

$$
\lambda\_0(P^{1,2}) = \begin{pmatrix} 4 \\ 4 \\ 4 \\ 4 \\ 4 \\ 4 \\ 4 \end{pmatrix}; \quad \lambda\_1(P^{1,2}) = \begin{pmatrix} 1 \\ 1 \\ 1 \\ 0 \\ 0 \\ 1 \\ 1 \end{pmatrix}; \quad \lambda\_2(P^{1,2}) = \begin{pmatrix} 0.437616 \\ 0.3388 \\ 0.293 \\ 0 \\ 0 \\ 0.611448 \\ 0.3468 \end{pmatrix};$$

$$
\lambda\_3(P^{1,2}) = \begin{pmatrix} 0.23464 \\ 0.11492 \\ 0.090664 \\ 0 \\ 0 \\ 0.43253416 \\ 0.127612 \end{pmatrix}; \dots$$

#### **3.6.3 Entropy Feature Index**

For a probability vector - *p*1,..., *pj*,..., *pm* with *m* items, an entropy feature index λ*<sup>E</sup>* is defined by Eq. (3.37).

$$\lambda\_E(\Psi) = -\sum\_{l=1}^m p\_l \, \, \, ^\*\ln(p\_l). \tag{3.37}$$

In polynomial index family {λ*n*(-)}*<sup>n</sup>*≥<sup>0</sup>, λ0(-) indicates the length of vector and λ1(-) provides the normalized measure. In addition to {λ*n*(-)}*<sup>n</sup>*≥<sup>0</sup> family, λ*<sup>E</sup>* (-) provides another type of indexes in relation to the entropy measurement. Using one of these indexes, it is feasible to distinguish two probability vectors in different permutation groups.

For example, using the same probability matrix *P*<sup>1</sup>,<sup>2</sup> of Eq. (3.21), its entropy index λ*<sup>E</sup>* is

…

$$
\lambda\_E(P^{1,2}) = \begin{pmatrix}
1.015748065 \\
1.356003379 \\
1.305367539 \\
0 \\
0.6714638476 \\
1.113842971
\end{pmatrix}
$$

.

#### *3.7 Two Probability Vectors and Their Feature Indexes*

Two probability vectors *i*,*j <sup>k</sup>* and *i*,*j <sup>l</sup>* , have two distinct index families λ*n i*,*j k* ! *n*≥0 , λ*n i*,*j l* ! *n*≥0 and ∃τ,λτ (*i*,*j <sup>k</sup>* ) λτ (*i*,*j <sup>l</sup>* ), 1 < τ ≤ λ0(*i*,*j <sup>l</sup>* ) then the two vectors belong to two different permutation groups.

For two probability vectors *i*,*j <sup>k</sup>* and *i*,*j <sup>l</sup>* , each vector belongs to one permutation group and cannot be generated from another vector then ∃*n* > 1, λ*<sup>n</sup> i*,*j k* ! λ*n i*,*j l* ! , 1 < *n* ≤ λ<sup>0</sup> *i*,*j l* ! .

Under such conditions, if two vectors have different index families, then they are in different permutation groups. In another way, when two vectors cannot be generated from another one, at least one indexes is distinguishable.

#### *3.8 CBM Construction*

Let CBM denote a Component Ballot Model. A CBM is a collection of a ballot form, vote sequences, poll and poll component matrix collection, probability matrix collections with normalized probability vectors plus the selected indexing family for an election.

$$\text{CBM} = \left( B \middle| V, VS, \left\{ P^{i,j} \right\}, \{ \lambda\_i \} \right). \tag{3.38}$$

Compared with SBM (Eq. 2.3) and CBM (Eq. 3.38), it is clear that the SMB is the simplest case of CBM and CBM provides more powerful properties for refined descriptions and comparisons in complicated voting applications.

**Two-D Separable Proposition** For two candidates to gain similar number of votes in the uncertain condition, it is always feasible to use other categorical information (i.e. location, age group) to re-partition sub polls for each candidate. If the two refined probability feature vectors belong to two permutation groups, then the uncertain problem can be solved in most case scenarios by using the polynomial feature index family or the entropy future index.

*Proof* For most case scenarios, cross-classified categorical data make corresponding probability feature vectors with significant differences in relation to respective density distributions. Under different categories without simple correspondences, this mechanism makes it possible to use the same strategy to handle votes for candidates. Since one party may be very strong in certain polices and relative weak in other strategies, those differences create various probability feature vectors easier located in different permutation groups. Even in the most balanced election events from a global viewpoint, hugely distinguishable distributions exist in local regions. This is the most important reason for two probability feature vectors making a pair of significantly distinct feature indexes. -

In a complex dynamic system, equilibrium is the most probable state when the system is in dynamic balance. However, there are significant differences among local areas even in the most equilibrium conditions. This is the most powerful part of proposed model for solving uncertainty in general for complex dynamic systems.

For an election to avoid uncertainty and frustrations due to the voting result in uncertainty, it is necessary to pre-select additional odd *m*−1 ≥ 1 categories different from candidates. Following main conclusion can be statement.

**Voting Authority Proposition** If two candidates in an election under approval rule are in uncertainty, then additional categories (odd *m* − 1 ≥ 1) under pre-agreed conditions could be used. These create the *m* −1 pairs of feature indexes for making the decision for who will be the winner.

*Proof* According to the two-D separable proposition, each additional category can provide a pair of significantly distinct feature indexes to separate the two candidates, and all selected *m* −1 pairs have such properties. Considering *m* −1 an odd number, each pair of indexes acts as an authority vote. So, there is no problem using the majority rule to make the decision. -

#### **4 Conclusion and Further Work**

In the proposed Component Ballot Model, multiple probability-feature matrix collections are employed and component categories other than the candidate are proposed on ballot papers to overcome confusion and frustration when two candidates are in uncertainty.

Applying advanced invariant constructions to probability feature vectors and also distinguishable properties among measurements in polynomial and entropy feature index families, voting authority provides a stable indexing mechanism to make the whole calculation based on valid votes. Distinguishable properties and invariant properties among feature index families provide reliable measurements for election outcomes.

The basic ideas, tools and technologies in the chapter are originated and created from the author's research works in 1990s for advanced content-based information retrieval and image feature indexing [18–20].

Because the approval rule is only one of the rules in practical voting systems, reader may read author's other paper discussing related aspects of voting theory under plurality and majority rules [21]. It is interesting to know whether the proposed new model can apply to other voting systems (such as Borda rules, proportional-representation system and preference voting systems) consistently. Similar uncertainty exists in other voting mechanisms. This will be a natural extension of current study.

To satisfy practical voting systems, it is essential to establish testing frameworks to make recommendations for the specific invariant properties contained in the proposed or new indexing families. There is no doubt that different voting systems may require various combinations of different feature indexing schemes to satisfy their optimal properties. More case studies linking between theoretical models and practical applications should be conducted to solve complicated voting paradoxes and other similar problems.

**Acknowledgements and Disclaimer** The author would like to express his gratitude to Dr. Wilson Wen for distinguishing the relationship between a feature matrix and a contingency table. Sincerely thanks also go to Dr. Gangjun Liu, Dr. Grahame Smith, Ms. Wilna Macmillan and Dr. Wen Dai for their invaluable comments, suggestions, modifications and careful proofreading of the manuscript. The constructions and conclusions contained in the chapter are merely the author's personal opinion of a scientist from a complex-dynamic system view. The author would like to take full responsibility for the contents. No government agent or company should bear the responsibility for the chapter.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Part V Applications—Global Variant Functions

The only thing permanent is change.

—Immanuel Kant

The scientist needs an artistically creative imagination.

—Max Planck

The thought: A logical inquiry.

—Gottlob Frege

Extensive researches were focus on global function and their distributions published in the period of 2000–2010. Conjugate transformation and content-based image retrievals are typical examples for development. Using a hierarchical architecture of knowledge model, multiple levels of balanced structures were developed in both image analysis and processing, e.g., Towards Automated Mammographic Image Analysis, Proceedings of the 2005 IEEE International Conference on Information Acquisition 85–90, and content-based retrievals, e.g., Mixed Query Image Retrieval System, Proceedings of the 2007 IEEE International Conference on Information Acquisition DOI:https://doi.org/10.1109/ICIA.2007.4295776.

Associated with variant logic and various applications, wider explorations were carried out in the fields of cellular automata functions under different symmetric conditions that were examined. For example, Permutation and Complementary Algorithm to Generate Random Sequences for Binary Logic, International Journal of Communications, Network and System Sciences 4(5):345–350, 2011.

This part of global variant functions is composed of five chapters (11–15).

Chapter "Biometrics and Knowledge Management Information Systems" describes a hierarchical framework to use concept cell model on Biometrics & KMIS applications. Searching for brides and fingerprints was samples of typical applications in addition to process on SARS and fingerprint images.

Chapter "Recursive Measures of Edge Accuracy on Digital Images" uses recursive measures to handle image edges under different conditions to compare various edge algorithms, edge quality, and their accuracies. Conjugate maps and four other edge schemes {Gradient, Laplacian, Gaussian, Mathematical Morphology} were selected.

Chapters "2D Spatial Distributions for Measures of Random Sequences Using Conjugate Maps" to "3D Visual Method of Variant Logic Construction for Random Sequence" use variant logic framework to illustrate 2D/3D and visual maps of variant logic operations on n = 2 conditions to show global visual distributions in their configurations of functional spaces.

# **Biometrics and Knowledge Management Information Systems**

**Jeffrey Zheng and Chris Zheng**

**Abstract** Biometrics and knowledge management information systems are two important fields in recent years to attract wider attentions from different social groups. This chapter explores the use of hierarchical construction linking with biometrics applications and knowledge management information systems. The key issues are discussed and a sample case of information acquisition in content-based image retrieval system has been illustrated.

**Keywords** Biometrics · Complexity · Hierarchical organization · Feature classification · Content-based image retrieval

## **1 Introduction**

Biometrics has attracted people attention in recent years due to terrorist attack and rapid scientific development and advanced information technology. In the twentyfirst century, one of the most significant achievements in biology decodes a full list of gene codes of human DNA sequences. Using advanced pattern recognition technology, it is now convenient to make real-time face verification and fingerprint identification.

J. Zheng (B)

Key Laboratory of Software Engineering of Yunnan, Yunnan University, Kunming, China e-mail: conjugatelogic@yahoo.com

C. Zheng Tahto, Sydney, Australia e-mail: z@caudate.me

This work was supported by Australian Commercialising Emerging Technologies, (COMET) program.

In general, all quantitative measures of living objects and activities from different sources including biology, anatomy, sound, photo, electronics and nerve pulse could link to biometrics. In such extremely complicated fields and areas, if we can efficiently acquire essential information to be manipulated by knowledge management information systems, then this mechanism will play an important role in the practices of applied biometrics. Useful concepts, methodologies and software/hardware toolkits in the direction will be invaluably helpful biometric applications in practical environments.

To resolve real-world problems, it is useful to apply system engineering schemes using analysis and synthesis mechanisms. In this chapter, hierarchical construction will be used as a framework to represent biometrics and knowledge management information systems. The original concepts and methodologies used in the chapter come from an established theoretical construction of dynamic systems conjugate classification and transformation [1–3]. Main algorithms and methods from the concepts have been implemented into software packages in advanced image analysis, content-based image retrieval and image understanding systems.

Using these concepts and methodologies in biometrics is a new application. The author would like to have this opportunity to sincerely discuss the possibility with other experts of the field in detail.

#### **2 Different Complexity Issues in Biometrics Applications**

Different measurement may have variant forms and contents in practical biometrics applications. In a measure space, measure data set can be relevant to length, position, angles, time and other basic measurable quantitative. Using dimension number of geometric spaces representing different biometrics objects has been shown extremely useful in many applications. Very rich contents can be observed through representatives of biometrics measures.

#### **Infrared Detector for SARS detection (1D body temperature** > **38** ◦**C**)

In protecting SARS virus distribution process, infrared detectors installed on the major channels of airports, stations and customs played active roles in indirectly measure body temperature whether higher than 38◦. This process has significantly reduced the SARS virus fatal distributions.

#### **DNA sequence (1.5D sequence)**

A DNA sequence is composed of four types of gene codes forming of conjugate pair linear structure. Since the sequence itself has very complicated combination characteristics and also local grouping properties, this makes structure much more complex than simple 1D linear sequence [4].

#### **Face identification and early breast cancer detection (2D)**

In most image analysis systems, especially face identification and early breast cancer detection systems use of 2D features in manipulations. In larger applications or data sets, those feature spaces are very complicated.

#### **CT scanning and reconstruction (3D and higher D)**

Using modern CT scan medical imaging equipments, it is feasible to reconstruct 3D images from multiple 2D image slice sequences to represent complicated projection and dynamic properties of interested areas and organs. 3D visualization has much more complicated properties than 2D image visualization process.

#### **Retinal analysis and synthesis (higher D nerve network)**

The detailed principles of retinal nerve network in human vision is not fully understood. But their biological structures are well recognized by interconnected nerve networks. This type of connectivity is much higher than three dimensions. The corresponding symptoms of distributions among brain surfaces and visual simulations indicate hierarchical structures in optical nerve systems naturally [5].

#### **Abstract Thinking (Super Hypercomplex Cells)**

The capacity of abstract thinking may belong to super hierarchical organizations of nerve systems. If there are real nerve objects, this structure could be super hypercomplex cells or their superposition on extensive hierarchy [5].

From a certainty viewpoint, lower dimension cases have more certain properties than higher dimensions. In addition, higher dimension structure expressed abstract properties with more variables and richer possibilities in real-world cases.

#### **3 Proper Concepts, Methods and Useful Toolkits**

Using modern mathematical toolkits, concepts and methods such as geometric topology and combinatorial topology, it is feasible to use basic analysis on neighbourhood relationship of kernel structure to partition complicated systems into non-reducible invariant characteristics base family. Using non-reducible bases as generators, it is possible to apply synthesis techniques to rebuild complicated systems in certain forms [6]. In invariant and singularity analysis relevant applications, global topologic characteristics play core roles using modern mathematics analysis toolkits [7]. Since connectivity belongs to one of the topological properties, higher dimensional geometric problems could be represented as graph problems or other forms to use common probability and statistical methods for practical calculations to resolve the equivalent problems in certain degree [8]. It does not matter how to represent a certain problem in detail, and abstract concepts could be always represented as lattice structures.

After systematic analysis of modern knowledge management information systems in concepts, principles and operational levels, a useful kernel structure Concept Cell Model for knowledge management using directed acyclic lattices in hierarchical constructions has been proposed for base construction toolkits of representation [9, 10]. The model can distinguish two similar lattices of three essential concept levels in different abstract structures as building lattice constructions:

Time Invariant Structure: Descriptive Knowledge Lattice (Tacit, Implicit, Explicit) Time Variable Structure: Procedure Knowledge Lattice (Start, Operation, Finish).

Undertaken hierarchical construction, it is convenient and efficient to represent knowledge systems in information request, abstract representation, categories, organization and other statistic and dynamic application requirements.

Concept cells in hierarchies can efficiently represent from real measurement data sets to higher levels of conceptual networks to represent application systems as multiple levels of organizations. This provides an operational knowledge management framework to flexibly support from user cases, abstract design, and implementation and operation requirements for system engineering practices. By applying conceptual categories, it is feasible to construct useful application systems with powerful self-organization and self-learning capacities in wider engineering and social environments.

To easily understand the main point, it is convenient to show an example to represent a partial structure in implemented content-based image retrieval systems using hierarchical concept structures shown in Fig. 1.

In the construction, a single index represents specific content-based information extracted from an image. A set of images needs to correspond to a set of indexes, respectively, and is organized as a list. It is convenient to use a multiple hierarchy to organize the list of single indexes as its end nodes. Each intermediate node can be established as a group of multiple indexes with strong similarity properties in their contents as a combined index. By this way, a root node can be established by combined individual nodes and intermediate nodes to be the representative of the whole set of indexes. Three types of information can be distinguished as follows:

**Fig. 1** Descriptive lattice in hierarchical representation

Single index: individual information explicit Combined index: group information implicit Root index: whole information tacit.

Using descriptive lattice structure in multiple levels of representations, complicated content-based image retrieval system can be mapped to a multiple layout network structure. It provides efficient organization to do information acquisition and organization linking with individuals, groups and the whole in information network construction.

While search operation, the current index will check from root (tacit node) to get the best match through combined indexes (implicit nodes) and single indexes (explicit nodes) to obtain the best-matched cases in hierarchy. Using best match information, a selected image group will be determined as output results.

In Fig. 2, two sets of implemented results on brides and fingerprint verification are provided to illustrate visual qualities of retrieved output results. The 125th bride image is selected and a list of similar brides as retrieved results. The 194th fingerprint image has been selected as a query example, and the output result is shown in right panel and arranged by similarity from higher to lower values in relation to the best 20 matched images from the image database in which the 194th, 193rd and 195th images are strong relative fingerprints from the same person.

Two sets of image processing results are shown in Figs. 3, 4 and 5. In Fig. 3, four enhanced results on an original SARS image are selected. In Figs. 4 and 5, various results of a fingerprint image are processed in different parameters under special enhanced functions.

#### **4 Demand in Future Society**

From biometrics measure viewpoint, measure data itself can be very accurate and crystal certain as numeric values. However, through hierarchical construction, more uncertainty will appear as higher level contents. Complicated interconnections will be linked with simply single measures to complicated global organization. Using hierarchical construction, it is feasible to organize single, group and whole information through network construction to cover wider applications.

In rapid development of web-based network, high-speed interactive facility and quick connections have changed traditional concepts and methods significantly. It is a convenient approach to use knowledge management information system to do information acquisition, intelligent analysis, combination and synthesis.

**Fig. 2** Search results: **a** Brides; **b** Fingerprints

**Fig. 3** Four image enhancements on SARS image (**a**–**e**); **a** Original; **b** Positive enhanced; **c** Valley enhanced; **d** Hill enhanced; **e** Negative enhanced

Hierarchical operations become the most advanced parts of optimal control and best operational strategies. In the current application environment, fast, convenient and efficient design and implementation can get wider applications in many fields. It can be expected to use automatic and intelligent methodologies to complete complicated issues, especially on complex and time consumed design processes. Facing of many practical applications, simple and unified concepts can help larger dynamic system in forming stable structures. Global interactive connection and their evaluations will be helpful for social environment in high speed and sustainable development.

**Fig. 4** Four image enhancements on fingerprint image (**a**–**e**); **a** Original; **b** Positive enhanced; **c** Valley enhanced; **d** Hill enhanced; **e** Negative enhanced

#### **5 Base Strategy of Development**

Any theoretical scheme cannot ensure itself in practice operations successfully without carefully matching environment requirements. In current social and economic conditions, it is more important for biometrics to make a positive impact on social economy to help the existing developments. Market-oriented mechanism can be used to resolve key problems in applications. It is most important to identify core technology in the application and collect the required energies and resources to attack it resulting in significant impact.

In knowledge management information systems, content-based acquisition, representation, indexing and retrieval components are the core components for automatic

**Fig. 5** Ten enhanced results of a fingerprint image (**a**–**c**); **a** Original; **b1**–**b5** Hill enhanced; **c1**–**c5** Valley enhanced; **b1**/**c1** α = 30; **b2**/**c2** α = 80; **b3**/**c3** α = 128; **b4**/**c4** α = 160; **b5**/**c5** α = 220

organization and high-efficient retrieval. Ultra-fast and accurate retrieval technology for databases and meta-knowledge bases can be widely used in many applications to satisfy information acquisition, extraction, categories, and organization, storage and retrieval requirements. Under global web-based environment, hierarchical organization of knowledge management systems and biometrics will be further refined and developed in health environment.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Recursive Measures of Edge Accuracy on Digital Images**

**Jeffrey Zheng and Chris Zheng**

**Abstract** In this chapter, an edge accuracy model is proposed on digital images and five types of edge detection methods are discussed as examples to investigate their edge maps undertaken recursive operations. Using invariant criterion, it is possible to compare different schemes in accuracy, consistency, completeness and simplicity. This provides general mechanism in relation to accurate edge extractions from digital images.

**Keywords** Edge detection · Accuracy · Invariant · Digital image

## **1 Introduction**

Edge detection plays a fundamental importance in image analysis, processing and computer vision applications. As the first step of visual perception, extensive R&D has being focused for 40 years (more than forty thousand years—drawing arts in human civilization). Many useful edge detection operators have been invented and applied in wider applications.

From an operational viewpoint, edge detection creates edge maps from images shown in Fig. 1a. Edge detection operators identify significant changes from visual objects as their edges or contours. From a historical viewpoint, common edge detec-

J. Zheng (B)

C. Zheng Tahto, Sydney, Australia e-mail: z@caudate.me

This work was supported by Australian Commercialising Emerging Technologies, (COMET) program.

Key Laboratory of Software Engineering of Yunnan, Yunnan University, Kunming, Yunnan, China e-mail: conjugatelogic@yahoo.com

**Fig. 1** Recursive edge extraction. **a** Edge detection; **b** Recursive edge maps; **c** Edge accuracy measures

tion approaches are divided into five approaches. Traditional edge detections have three main categories: Gradient, Laplacian and Gaussian; another two categories are mathematical morphology and conjugate. The five categories will be briefly introduced as follows.

## *1.1 Gradient*

Gradient scheme has a direction corresponding to convolution operations; we can use 2×2, 3×3 matrices or more complicated schemes to construct relevant operators, for example, Roberts operator uses 2×2 matrix to detect edges on main diagonal or anti-diagonal directions. Prewitt, Sobel and Isotropic schemes take 3×3 matrices using different parameters to extract horizontal or vertical edges from digital images shown in Fig. 2a.

#### *1.2 Laplacian*

A typical Laplacian scheme is Marrs–Hildreth's zero crossings. This scheme uses the second differential information to determine zero crossings of the edges shown in Fig. 2b.

## *1.3 Gaussian*

Canny edge filter plays a significant role in advanced edge detection applications from late of 1980s. This scheme applies Gaussian smoothing filter first, then gradient operations and finally thinning processes and its final results shown in Fig. 2c. Different from Gradient and Laplacian schemes, Canny edge detection provides controllable parameters to balance noise levels and significant edge components. Because of its controllable properties, this scheme widely used in many practical applications in relation to significant edge components.

#### *1.4 Mathematical Morphology*

Mathematical morphology plays an important role in advanced image analysis and processing applications from 1980s. Using discrete patterns as morphological masks, the method applies erosion and dilation, opening and closing operations on the processed images. This method distinguishes edge and non-edge masks. In general, only translation invariant can be retained in operations. Each time of basic operation uses one mask on either erosion or dilation corresponding to reduce or extend boundaries of the visual objects. There is no simple relationship between the selected mask states and edge states. Two edge maps using a crossing mask under either erosion or dilation are shown in Fig. 2d. Each edge map has been calculated by either dilated or eroded output image subtracted by the input edge map.

Negative map: Lena.NM.50.50.10.bABCDEFGHIJKLac.-4.256.256.gif

Positive map: Lena.PM.50.50.10.bABCDEFGHIJKLac. -4.256.256.gif

(e) Conjugate (Negative & Positive maps)

**Fig. 2** Different edge detection methods. **a** Gradient; **b** Laplacian; **c** Gaussian; **d** Mathematical morphology; **e** Conjugate

#### *1.5 Conjugate*

Conjugate scheme has been developed from 1990s and based on a full pattern classification of nearest neighbourhood relationship of discrete states on regular plan lattices under rotation, reflection and translation invariants. This approach can express local patterns into invariant groups such as isolated, inner, block edge and intersection to organise whole pattern space as a hierarchical construction. Both background and foreground information need to be represented as balanced structures in conjugate phase space. Under certain conditions, it is feasible to use two types of edge maps in representations. In Fig. 2e, two typical edge maps are illustrated to use conjugate scheme:


From edge detection considerations, different operations provide special properties to be emphasised by various visual information from images. Simple convolution filters may provide fast process; however, it is highly possible to be sensitively influenced by minor noise levels. Among three traditional edge detection schemes, Canny edge detector shows an important characteristic with a series of controllable edge maps in reliable properties. Because distinct edge detectors have different behaviours, it is very hard for applicants to make simple selections apply the best one among schemes. Mathematical morphology applies discrete masks in operations. Since edge maps normally do not correspond to masks themselves directly, it is difficult to establish a link from relevant operations and edge detection results.

Considering edge detection operation extracts edge map from digital images. Under this viewpoint, we need to establish a proper model in determining invariant properties among edge detection schemes.

#### **2 Recursive Model of Edge Accuracy**

Different edge detection methods cover various applications with advantages in many aspects. From a practical viewpoint, it is hard for users to make proper judgment on which method provides the best edge map to satisfy suitable applications. From history of edge detection research, no model can provide general mechanism in systematic comparison among distinguished methods. Since the target of different edge detections creates edge maps, it is natural for us to determine under which conditions the edge maps can represent true edge.

#### *2.1 Question*

Could an extracted edge map be a true edge representation?

From a morphological viewpoint, true edge map needs to have invariant properties relevant to their geometric and topological constrains. In many theories and practices in relation to dynamic systems and cybernetics, recursive methods and models have been approved to be a foundational importance in detailed analysis tasks. A recursive model has been applied in testing edge detection operators to explore their refined properties shown in Fig. 1b. Using this feedback mechanism, edge map needs to be looped back again undertaken the same type of edge detection operators. The recursive loop shows an important magnification to identify dynamic behaviours among input and output pairs directly.

#### **3 Four Types of Edge Accuracy Measures**

Under the recursive approach, a true edge representation must be the recursive edge map itself. Such invariant of recursive operations can be observed as intrinsic properties in relation to the edge detection operators themselves. In addition to invariant properties, many rich effects among input and output pairs need to be concerned. To make proper judgment among recursive results, it is essential to apply four different accurate measures shown in Fig. 1c. They are {=, ≈, , Ø} representing accurate, almost accurate, inaccurate and trivial behaviours, respectively, between input and output edge maps. From matched results between extracted edge map and its recursive edge extraction map, it is feasible to determine the category in which generated results need to be belonging to. This provides a general model independent of a specific edge detection scheme. If anyone would like to check which category could be belonging to a special scheme, the person can simply apply this recursive mechanism to check specific method itself directly in explorations.

#### **4 Four Sample Groups of Recursive Edge Maps**

In Fig. 3a–d, four groups of recursive edge maps are generated in illustration. Two operators are selected from Photoshop: Find edge (Gradient) Fig. 3a and trace contour (Zero crossings) Fig. 3b. Find edge operation has a clear variant property, and trace contour will have a flip-flop behaviour after certain operations. One example is selected from Canny edge detector shown in Fig. 3c. Recursive results of Canny operation show that two sets of examples are shown in Fig. 3d for mathematical morphology. It is interesting to see dilation representing almost invariant properties and erosion creating edge map similar to zero crossing effects. To show different recursive properties of conjugate scheme, four sub-operators are illustrated in Fig. 3e1–e4.


**Table 1** Edge detection schemes and their accuracy properties

Each group shows a specific category among three non-trivial results. In conjugate edge detection operators, there are two types of controllable parameters that are available corresponding to meta-shape parameters {A, …, L, a, …, l} and enhanced ratio control {−8, …, 8}. Both controllable parameters can provide universal edge representation on true edge map to support various edge representations undertaken selected operations.

#### **5 Comparison**

Using the five categories, it is feasible to make summary in Table 1. This provides a systematic way in comparison.

(A2) The second edge map

(A3) The third edge map (A4) The fourth edge map

(A1) ≠ (A2) ≠ (A3) ≠ (A4) No invariant edge map available! Recursive condition: Directly use find edges filter to each map

(A). Photoshop: Find Edges (Gradient)

(B). Photoshop: Trace Contour (Zero Crossing) (B4) The 54th map (B1) ≠ (B2) ≠ (B3) ≈ (B4) Flip flap variations after the 53rd operation Recursive condition: Trace contour filter (level = 119, edge = low)

**Fig. 3** Recursive maps of different edge detection operators. **a** Find edges; **b** Trace edge; **c** Canny edge detection; **d** Morphology; **e** Conjugate edge detection

(D11) ≠ (D12) ≠ (D13) ≠ (D14) Edge maps invariant Recursive Condition: Erosion using a crossing mask

(D21) The first edge map

(D24) The 4th (D23) The third edge map edge map

(D22) The second edge map

(E12) The fifth edge map

(E14) The 1000th (E13) The 100 edge map th edge map

(E23) The third edge map (E24) The fourth edge map

(E22) The second edge map

(E21) ≈ (E22) ≈ (E23) ≈ (E24) Similar edge maps with noise removing Recursive Condition: NM.50.50.10.abcdefghijklABC.-2

(E34) The fourth edge map (E33) The third edge map

(E31) ≠ (E32) ≠ (E33) ≠ (E34) Changed edge maps Recursive Condition: PM.50.50.10. cdefCDEF.-2

(E). Conjugate Edge Detection

Recursive Condition: PM.50.50.10.ABCDEFGHIJKLabc.-2

**Fig. 3** (continued)

# **6 Conclusion**

Existing edge detections are without unique recursive maps as their representations. Conjugate technology provides full controls to create true edge maps in accuracy and invariance.

True edge maps contain unique shape information in fundamental importance to support all visual applications.

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **2D Spatial Distributions for Measures of Random Sequences Using Conjugate Maps**

**Qingping Li and Jeffrey Zheng**

**Abstract** Advanced visual tools are useful to provide additional information for modern information warfare. 2D spatial distributions of random sequences play an important role to understand properties of complex sequences. This chapter proposes time sequences from a given logical function of 1D cellular automata in both Poincare map and conjugate map. Multiple measure sequences of Markov chains can be used to display spatial distributions using conjugate maps.Measure sequences are recursively produced by different logical functions generating maps. Possible complementary feature exists between pair functions. Conjugate symmetry relationships between a pair of logical functions in conjugate maps can be observed.

**Keywords** Time sequence · Random property · Cellular automata Spatial distribution · Conjugate symmetry

# **1 Introduction**

Random sequences are widely used in many security-based applications such as security communication, cryptology coding, and information security systems [1]. To make proper analysis, Markov chain methodologies and technologies provide a series of important methods and tools to help analyzers decoding process [2–4]. In modern information warfare, it is essential for analyzers to detect and decrypt the opponent's communications using information acquisition toolkits from real coding sequences [5].

Q. Li

J. Zheng (B)

This work was supported by Yunnan Advanced Overseas Scholar Project.

School of Software, Yunnan University, Kunming, China e-mail: lqpbupt@126.com

Key Laboratory of Software Engineering of Yunnan, Yunnan University, Kunming, China e-mail: conjugatelogic@yahoo.com

Information Warfare describes terms of "actions" executed to achieve a sought outcome—denial, exploitation, corruption and destruction of an opponent's "information" and related functions, and prevention of such "actions" executed by an opponent [6].

The battle between the obscurers and those who sought to break the codes has been a continual one, but it reached a new level of stature and importance duringWorldWar II with its decryption of Germany's Enigma messages. Historic events are approved that statistical and probability tools are extremely important in Information Warfare applications. This battle of wits fought by British mathematicians and statisticians shortened World War II and ushered in the age of information warfare [7].

Prerequisite of executing these attack actions is thoroughly understood by the mechanism of information encryption that opponent uses [8]. In information warfare, secured communications among opposite parts may use public networks. It is feasible to capture relevant information for further analysis. Different quantitative tools and methods are useful to provide additional information in decoding process. Variant features play an important role for measurement and analysis of random sequences [9].

Because of the implicated expression of functions that generate random sequences, it is hard to get the characteristic of random sequences from the function and coding sequences themselves [10]. Traditionally, time sequence map and Poincare map are the two most popular methods to take the measure features of a random sequence in two dimensions [11]. From a visual viewpoint, current Markov chain schemes do not provide efficient visual mechanism to display multiple measurement sequences from the spatial characteristic of complex random sequences.

To extract further information from random sequences, this chapter establishes a visual system to illustrate multiparameter measurement sequences of Markov chains as conjugate maps. For a given set of measurement sequences, the conjugate map proposed in this chapter can provide refined information of distributed structure than present map technologies [12].

In the second section, respective characteristics of traditional methods and conjugate method are discussed. The measurement mechanism of logical function's spatial characteristics, disposal model, measuring model, and visualizing model, is described in the third section. The results of maps and analysis of the results are discussed in the fourth and fifth sections, and then, concluding remarks are provided in the last section.

#### **2 Traditional Methods and Conjugate Method**

In this section, two typically traditional methods, time sequence map and Poincare map, are discussed for comparison.

Time sequence map generates a 2D coordinate; *X*-axis is determined by the time scale *t,* and *Y*-axis is determined by the value of measured parameter *f* (*t*), as shown in Fig. 1a.

**Fig. 1** Simple time sequence map and Poincare map; **a** Time sequence map, **b** Poincare map

The measure sequence { *f* (*t*)} *T*−1 *t*-<sup>0</sup> with length of *T* can form Poincare map according to the matching pattern considering data correlation. Poincare method maps one group of measures of time sequence to a 2D map. It detects spatial distribution of sequence through the distribution of point cluster. In Poincare map, *X*-axis is determined by the value of *f* (*t*) while *Y* is *f* (*t* + *l*). It is vicinity-related patterns map when *l* -1, as shown in Fig. 1b.

Different from Poincare method based on one group of measures, new map proposed in this chapter chooses two groups of measures from relevant parallel measures sequences. As two different groups of measures are acted simultaneously, the value of each axis is determined by these two groups of measurements. It is convenient to name new map as conjugate map to present this kind of multiple parameter measurement map.

#### **3 Generate and Measure Mechanism of Time Sequence**

In this section, the Cellular Automata (CA) method is applied to generate time sequence and then to make concomitant measurement sequence. First, the initial sequence inputted, and the output sequence is generated by a given logical function using 1D cellular automata. Using this data sequence, measurements are formed by probability measurement according to pairs of input and output sequences. Finally, the generated measure sequences can be used to construct a 2D conjugate map showing 2D spatial distribution of the time sequence. The processing flow of the mechanism is shown in Fig. 2.

**Fig. 2** Flow sheet of the produce and detect mechanism of time sequences

**Table 1** I/O pattern of disposal model


#### **Table 2** Exhaustion of initial input sequences


#### *3.1 Disposal Model*

Consider a logical function *f* as a function of CA. The function generates equallength output sequence {*Yi*} *N*−1 *i*-<sup>0</sup> for any initial input sequence {*Xi*} *N*−1 *i*-<sup>0</sup> with *N*-length bits. The I/O pattern is shown in Table 1.

A total of 2*<sup>N</sup>* states of *N*-length initial input sequence are exhaustively generated, and the corresponding sequence under the logical function *f* : *X* → *Y* can be generated. The input and the output sequences are in the same group corresponded to each other; there are 2*<sup>N</sup>* groups of corresponding relationship [13]. Exhaustion of all the initial input sequences is shown in Table 2.

#### *3.2 Measure Model*

The basic model of measurement can be confirmed to establish the transformation relation between the input sequence {*Xi*} *N*−1 *i*-<sup>0</sup> and the output sequence {*Yi*} *N*−1 *i*-<sup>0</sup> for each group.

In the transformation of *f* : *Xi* → *Yi,* 0 ≤ *i < N*, there are a total of four types of transformations, each type determines a number, and corresponding relationships are shown in Table 3. This type of measurement structure has a directly corresponding relationship to the Markov chain mechanism [4].


**Table 3** Measure parameters


Consider *j* ∈ {0*,* 1*,* 2*,...,* 2*<sup>N</sup>* − 1} as the serial number of different initial input sequences. There are four measurements that can be identified by the measurement parameters above shown in Table 4 with Markov chain properties, respectively.

For different initial input sequences, there can be generated four groups of measurements on the corresponding I/O sequences: {*P*00(*j*)} <sup>2</sup>*N*−<sup>1</sup> *j*-<sup>0</sup> , {*P*01(*j*)} <sup>2</sup>*N*−<sup>1</sup> *j*-0 , {*P*10(*j*)} <sup>2</sup>*N*−<sup>1</sup> *j*-<sup>0</sup> , and {*P*11(*j*)} <sup>2</sup>*N*−<sup>1</sup> *j*-0 .

#### *3.3 Visualization Model*

Based on the probability measurements presented above, two measurements are chosen to construct 2D map, as two different groups of measurements are used simultaneously, to name this kind of map conjugate map, of which the value of each axis is determined by these two groups of measurements.

According to the construction pattern introduced above, there are *C*<sup>2</sup> <sup>4</sup> - 6 kinds of different combinations as below:{*P*00(*j*)*, P*01(*j*)},{*P*00(*j*)*, P*10(*j*)},{*P*00(*j*)*, P*11(*j*)}, {*P*10(*j*)*, P*11(*j*)}, {*P*01(*j*)*, P*11(*j*)}, and {*P*01(*j*)*, P*10(*j*)}.

On the same group of sequences, construct 2D conjugate maps, respectively, by using the combinations above as shown in Fig. 3.

This chapter chooses the typical combination {*P*01(*j*)*, P*10(*j*)} constructing 2D conjugate map to detect the special distribution of time sequences for *N* -13 condition.

**Fig. 3** 2D conjugate maps constructed by separate six pairs of measures of No. 6 function; *N* -13

#### **4 Visualization Result**

Because of the restriction of the structural complexity of the logical function, 16 functions of 2 variables are used to describe them in the way of exhaustion [14]. Output sequences are generated by different initial input sequences under the given logical function and then obtaining various measure data from the corresponding I/O sequence based on probability method. Then, the map is constructed using these measurement data.

This chapter chooses No. 1, 5, 6, and 13 functions which are typical functions as an example, observing the characteristic of three kinds of maps which are given in Fig. 4.

In (a) group of time sequence maps, only one measurement sequence transforms with time.

In (b) group of Poincare maps, different functions form different point clusters.

In (c) group of conjugate maps, the distribution of the points cluster has clear polarized properties.

According to the variable-value logic theory, three kinds of encoding model can be distinguished: W, F, and C [15].

The visualization information that can be acquired from a single function's map is rather limited. In order to compare the spatial property of different logical functions, a 4 × 4 array is constructed using the maps that are generated from 16 logical functions in different encoding patterns as shown in Fig. 5.

**Fig. 4** Time sequence maps, Poincare maps, and 2D conjugate maps. **a** Time sequence map; **b** Poincare map; **c** 2D conjugate map

By assemble maps of total 16 logical functions under the models, the entire structure information among logical functions themselves can be observed.

To compare conveniently, combinations of 16 recursive images which generated from 16 functions are given in this chapter under different codes. Recursive images in W-code, F-code, and C-code from a given initial sequence are shown in Figs. 6, 7, and 8, respectively.


**Fig. 5** Assemble pattern of maps in W-code, F-code, and C-code

**Fig. 6** Recursive images in W-code

The combination of time sequence map is shown in Fig. 9. The figure shows that different functions have different distribution properties, and also reveals the trend of single measurement's transforming with time.

The combination of Poincare map in W-code is shown in Fig. 10. Different distribution properties of functions can be observed from the figure. It is clear that there are four groups of configurations appeared in the figure: {0*,* 8*,* 2*,* 10}*,* {1*,* 3*,* 9*,* 11}*,* {4*,* 6*,* 12*,* 14}*,* {5*,* 7*,* 13*,* 15}.

**Fig. 7** Recursive images in F-code

For W-code, Poincare maps are shown in Fig. 10 and corresponding 2D conjugate maps are shown in Fig. 11. Conjugate maps have polarized properties, and their function pairs of 0:15, 1:7, 2:11, 4:13 and 8:14 have conjugate symmetry. In general, 16 conjugate maps are different from relevant maps generated by Poincare maps.

To arrange 16 Poincare maps and conjugate maps by F-code structure, F-code maps are shown in Figs. 12 and 13, respectively.

Under C-code structure, Poincare maps and conjugate maps are shown in Figs. 14 and 15.

In the above maps, 2D conjugate maps not only show spatial distributions of different logical functions but also have special holistic symmetries under the F- and C-code conditions.

#### **5 Analyze**

Through three types of different maps, three different coding schemes can be observed.

**Fig. 8** Recursive images in C-code

Time sequence map can show the simple trend of single measurement series with time variations, but it was difficult for the scheme to describe spatial distributions of time sequence.

Poincare map can apply a single measurement sequence; although the map can be generated under different lengths in a correlation, information of distribution is naturally limited by the selected measurement sequence.

A 2D conjugate map uses two groups of independent measurements simultaneously; this scheme can show differences and connections between spatial distributions of logical functions; furthermore, through different coding models, it can illustrate holistic relationships among different functions, i.e., function pairs of 0:15, 1:7, 2:11, 4:13, and 8:14 have clear conjugated symmetry in conjugate maps. In addition, for C-code condition, the points of four functions on each edge of maps are located on the same side of edge. For example, points clusters of (0, 4, 1, 5), (0, 2, 8, 10), (10, 14, 11, 15), and (5, 7, 13, 15) functions are separately located on four sides of the 2D map space.

**Fig. 9** Time sequence maps of 16 functions constructed by {*t, P*0−1(*t*)} sequences

**Fig. 10** Poincare maps in W-code

**Fig. 11** Conjugate maps in W-code

**Fig. 12** Poincare maps in F-code

**Fig. 13** Conjugate maps in F-code

**Fig. 14** Poincare maps in F-code

**Fig. 15** Conjugate maps in C-code

#### **6 Conclusion**

Refined property of various time sequences can be identified from 2D conjugate maps to illustrate multiple measurement sequences under Markov chain mechanism. Spatial property of time sequence plays an important role in the study of dynamic sequence's behavior. The stable distribution under visualization method can help people understand relevant issues.

In comparison with Poincare maps and conjugate maps, there are additional properties in the complex dynamic sequences. Conjugate map method uses multiple parameters of Markov chains to make independent measurements simultaneously.

Proposed technology can provide further structural information among multiple measurements, and refined relationship via spatial distributions can be established. It is possible for the scheme to use statistical and probability methodologies to enhance visual tools of Markov chain mechanisms to resolve real problems and requirements for modern information warfare and information security applications in near future.

**Acknowledgements** Thanks goes to Mr. Jie Wan for helping him to generate data for this study and the special fund of Information Security (No. 2010KS06), Software School of Yunnan University to fund the project.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Permutation and Complementary Algorithm to Generate Random Sequences for Binary Logic**

**Jie Wan and Jeffrey Zheng**

**Abstract** Randomness number generation plays a key role in network, information security, and IT applications. In this chapter, a permutation and complementary algorithm is proposed to use vector complementary and permutation operations to extend *n*-variable logic function space from 2<sup>2</sup>*<sup>n</sup>* functions to 2<sup>2</sup>*<sup>n</sup>* ×2*<sup>n</sup>*! configurations for variant logic framework. Each configuration contains 2<sup>2</sup>*<sup>n</sup>* functions that can be shown in a 2<sup>2</sup>*n*−<sup>1</sup> <sup>×</sup> <sup>2</sup><sup>2</sup>*n*−<sup>1</sup> matrix. A set of visual results can be represented by their symmetric properties in W, F, and C codes, respectively, to provide the essential support on the variant logic framework.

**Keywords** Logic function · Permutation and complementary · Variant logic Symmetric distribution · Random sequence

# **1 Introduction**

Random numbers play an important role in many network protocols and encryption schemas on various network security applications [1], for example, digital signatures, authentication protocols, key generation for PKI, RSA/AES [2], nonce frustrate, and symmetric stream encryption. A better random number algorithm will enhance encryption schemas, to do other applications. To satisfy different requirements, the NIST has published a series of statistical tests as standards [3] to determine whether a random number generator is suitable for a cryptographic application. After using the

J. Wan

Yunnan University, Kunming, China e-mail: wanjiech@163.com

J. Zheng (B) Key Laboratory of Yunnan Software Engineering, Yunnan University, Kunming 650091, Yunnan, China e-mail: conjugatelogic@yahoo.com

Project supported by Yunnan Advanced Overseas Scholar Project, NSF of China (61362014).

.

vector complementary and the permutation operations on binary logic, the variant logical framework extends the traditional Logic function space from 22*<sup>n</sup>* functions to 22*<sup>n</sup>* × 2*<sup>n</sup>*! configurations [4]. Under the new extension conditions, it is possible to use simple transformation to generate huge numbers of random sequences for future applications.

Permutation and complementary algorithm is described in the chapter to express different random properties through a series of binary image sequences undertaking typical recursive operations.

#### **2 Method**

Cellular automata perform a natural way to generate random sequence. The principle of binary cellular automata [5, 6] can be explained by an example as follows:

First, a sequence 001100 and a function f : {00 → 0, 01 → 1, 10 → 1, 11 → 0} are selected.

Second, the sequence can be decomposed from left to right. The last bit is composed to the first bit

$$
\begin{array}{c}
\begin{array}{c}
\begin{array}{c}
\textbf{\bullet} \\
\begin{array}{c}
001100
\end{array} \\
\end{array}
\end{array}
\begin{array}{c}
\begin{array}{c}
\textbf{\bullet} \\
0001100
\end{array}
\end{array}
\end{array}
$$

Third, according to the decomposed sequences and the generating function, a new sequence 010100 can be generated, i.e., f : 001100 → 010100.

Followed the algorithm, the space of the generation function can be extended further; large numbers of random sequences can be generated. This mechanism can increase the complexity of code breaking.

In variant logic framework, the logic function space has been extended from 2<sup>2</sup>*<sup>n</sup>* to 2<sup>2</sup>*<sup>n</sup>* × 2*<sup>n</sup>*! by the permutation and the complementary operations. In two variable functions of cellular automata, there are 16 generated functions, and the 16 functions can be described in a truth table (Fig. 1a) with 16 entries.

#### *2.1 Permutation Operation*

The bit string of states {00, 01, 10, 11} in generating function can be converted to decimal number {0, 1, 2, 3}. An example in Fig. 1b is shown to permute 3210 to 1320 of the table.


**Fig. 1** Permutation example

#### *2.2 Complementary Operation*

In the complementary operation, the complementary vector σ is applied to operate the truth table.

*It can be described as*

$$\mathbf{y}^{\delta} = \begin{cases} \mathbf{y}, \delta = 1 \\ \overline{\mathbf{y}}, \delta = 0 \end{cases}$$

In two-variable variant logic, σ is a binary sequence of 4 bits in {0000,..., 1111}. In the example, the original table is σ 1111 and shown in Fig. 2a given σ 1100 in Table 2 which can be described as 1320(1100) 11312000. Under such operation, the sequence values of state 1 and 3 columns are invariant. But the values of columns whose index is 0 and values of the permutation sequence in state 2 and 0 are changed to their revised values, respectively.

After the complementary operation, Fig. 2a changes to Fig. 2b.

#### *2.3 Visualization*

For function f, once applied on the sequence 001100 to output 010100, then this function can be applied on the sequence 010100 to output 111100. For such binary sequence, select black for 1 and white for 0 to generate the visual patterns as follows (Fig. 3).

#### *2.4 Matrix Representation*

For example (Fig. 2b), the truth value of third function is 1010. It can be converted to a binary coordinate 10|10 distinguished by left two and right two bits, respectively. So the decimal coordinate is 2|2. Then Fig. 2b can be converted to Table 1.

Under such conversion, the 2D matrix can be represented in Table 2.

#### **3 Algorithm and Properties**

#### *3.1 Permutation and Complementary Algorithm*

Using permutation and complementary operations, an algorithm is extended to express the *n*-ary variant logic functional space.


**Table 1** Coordinate map of

**Table 2** 2D matrix of the 1320(1100)


Algorithm: Permutation and Complementary:

Input: variable n

Output: a set of truth table of P<sup>σ</sup>, <sup>∀</sup><sup>P</sup> <sup>∈</sup> <sup>S</sup>(2<sup>n</sup>), <sup>∀</sup><sup>σ</sup> <sup>∈</sup> B2n 2 .

Method:

Step 1. Initial T {2n2n − 1 ······ 10}

Step 2. Generate a permutation P for T

Step 3. From σ 000 ... 0 to 111…1 do vector complementary operation.

Step 4. Any new permutation?

Yes go to Step 2.

Step 5. End

where S (N) is a symmetry group with N member and BM <sup>2</sup> is an M variable Boolean structure with 2<sup>M</sup> members.


# *3.2 Representation Scheme*

Every truth table has a 2D matrix to arrange visual results of random sequence. The *X*, *Y* is the coordinate to allocate each visual result. So for n-ary logic function space, the 2D matrix has a size of 2<sup>2</sup>*n*−<sup>1</sup> <sup>×</sup> <sup>2</sup><sup>2</sup>*n*−<sup>1</sup> as shown in Table 3.

F 128 C 16

#### *3.3 W, F, and C*

Three coding schemes can be distinguished in the algorithm.

W code [4] is a binary sequence of 2*<sup>n</sup>* bits. It separates into two parts, *J* <sup>1</sup>|*J* <sup>0</sup> . Each part has 2*<sup>n</sup>*−<sup>1</sup> bits.

F code is a subset of W code, and it is a symmetry code. In F code, if the *I*th meta-state in *J* <sup>1</sup> is 1 or 0, the Ith meta-state in *J* <sup>0</sup> is the negative state.

If a code is F code, the *I*th meta-state in *J* <sup>1</sup> has the same value. Besides, four corners of its matrix are included in {0, *x*, *x*¯, 1}; it is C code [4].

For example, (32|10)(1110|0100) is an element of W code. In the sequence, 1 is not the negative sequence of 3, and the 0 is not also the negative sequence of 2. (32|01)(1110|0001) is an F code. It has the symmetry property. In the sequence, 0 is the negative sequence of 3 and 1 is the negative sequence of 2. (13|20)(0111|1000) is a C code. It has the symmetry property of F code, and four comers of 1320's matrix are included in {0, *x*, *x*¯, 1}.

The further definition of W, F, and C codes can be found in [4].

From the exhaustive of the binary variant function space, the number of W, F, and C codes in binary variant function space [7] is shown in Table 4.

#### **4 Coding Simples**


W Code: Permutation sequence: 3210 The value of σ:1011

**Fig. 4** The 2D matrix diagram and the visual result of 3210<sup>1011</sup>

F Code: Permutation sequence: 3201 The value of σ: 1111


**Fig. 5** The 2D matrix diagram and the visual result of 32011111


C Code: Permutation sequence: 1320

**Fig. 6** The 2D matrix diagram and the visual result of 1320<sup>1100</sup>

#### **5 Result Analysis**

In Fig. 4, W code is shown as a general code. Majority W code does not have apparent symmetry property. W code covers all the code spaces which are formed from binary input variable. These properties can be seen in Fig. 4.

All the F codes have overall symmetry in 2D distribution. Obvious symmetry among functions in the 2D matrix can be observed in Fig. 5.

Simple is shown in a C code in Fig. 6. It is a small set of F code with complete symmetry property. C code has the four-constant vertex property. The group of the four vertexes in C code are located by 0, 15, 10, and 5 functions, respectively.

In the n-ary logical function permutation and complementary algorithm, the permutation is operated for 2*<sup>n</sup>*!; the complementary exhaustive needs 2<sup>2</sup>*<sup>n</sup>* operation for each permutation operation. A total of computational complexity of an *n*-ary variant logical function using permutation and complementary algorithm is *O* <sup>2</sup>*<sup>n</sup>*! <sup>×</sup>2<sup>2</sup>*<sup>n</sup>* .

#### **6 Conclusion**

A permutation and complementary algorithm has been proposed for *n*-ary logical function, and sample results are visualized. The visual results of W, F, and C codes in the variant and invariant properties support the variant logic system through experimentation to use an algorithmic mechanism to generate a series of huge random number sequences.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **3D Visual Method of Variant Logic Construction for Random Sequence**

**Huan Wang and Jeffrey Zheng**

**Abstract** As Internet security threats continue to evolve, in order to ensure information transmission security, various encrypts and decrypts have been used in channel coding and decoding of data communication. While cryptography requires a very high degree of apparent randomness, random sequences play an important role in cryptography. Both Cellular Automata (CA) and RC4 contain pseudorandom number generators and may have intrinsic properties, respectively. In this chapter, a 3D visualization model 3DVM is proposed to display spatial characteristics of the random sequences from CA or RC4 keystream. Key components of this model and core mechanism are described. Every module and their I/O parameters are discussed, respectively. A serial of logic function of CA is selected as examples to compare with some RC4 keystreams to show their intrinsic properties in three-dimensional space. Visual results are briefly analyzed to explore their intrinsic properties including similarity and difference. The results provide support to explore the RC4 algorithm by using 3D dimensional visualization tools to organize its interactive properties as visual maps.

**Keywords** Pseudorandom sequence · CA · Stream cipher · RC4 keystream 3D maps

# **1 Introduction**

Wireless Sensor Networks WSN and Wireless Networks WN are most popular and widely used types of network of this era. Because of the openness these types of

H. Wang

Yunnan University, Kunming, China e-mail: lights127@gmail.com

J. Zheng (B)

Project supported by NSF of China (613620214), the Key R&D project of Yunnan Higher Education Bureau (K1059178) and Yunnan Advanced Overseas Scholar Project (W8110305).

Key Laboratory of Yunnan Software Engineering, Yunnan University, Kunming, China e-mail: conjugatelogic@yahoo.com

networks are not very much secure. To provide the security over the WSN and WN, algorithm used must be fast enough which can encrypt and decrypt data comparatively in less amount of time to require less resource too. In this concern, Wi-Fi Protected Access WPA and Wired Equivalent Privacy WEP protocols are used as standard. These standards have adopted the RC4 stream cipher algorithm to secure the data over the WN environment. These standard adopted RC4 algorithms because RC4 algorithm gives speedy encryption and decryption of data, utilize less hardware resource during processing, and easy to implement [1, 2]. Presently, RC4 algorithm is not secure in many aspects. Lots of weaknesses and attacks have been detected by the cryptanalysis [3, 4].

#### *1.1 The Weakness of RC4*

RC4 algorithm is a stream cipher under the symmetric ciphers algorithms. Typically, in a stream cipher, the keystream is the sequence which is combined digit by digit to the plaintext sequence for obtaining the ciphertext sequence. However, the data encryption is equivalent to a simple XOR with keystream. The keystream is generated by a finite state automaton called the keystream generator [5, 6]. The encryption can be broken if the plaintexts are encrypted using the same keystream. RC4 keystream generated by RC4 keystream generator is completely compromising the security of RC4.

Because it is very hard to trace the characteristics of keystream generators, random characteristics of keystream can be investigated on spatial characteristics of the keystream generator to test pseudorandom sequences. This chapter is the expansion work of [7] by Qingping Li from 2D to 3D. In this chapter, random sequences from given keystreams are collected in comparison with random sequences generated by sample logical function of 1D Cellular Automata to show their intrinsic properties in three-dimensional space of relationships.

#### *1.2 CA*

Cellular Automata is a great discovery in the twentieth century, and it forms a time series according to a given function in an iterations process by introducing logic function and related calculation methods in the natural pattern [8]. In 1985, S. Wolfram formed the sequential cipher from pseudorandom sequence generated from logic calculation using cellular automata. Because of the implicated expression of the logic function, the spatial characteristic cannot be directly observed from the function formula [9].

## **2 Architecture**

#### *2.1 Architecture*

The architecture is shown in Fig. 1a. The three main components and their modules are shown in Fig. 2b–d, respectively.

In the first part of this system, two types of data sets are generated by CACM and RC4KCM, respectively. The data sets on either CACM or RC4KCM get into

**Fig. 1** Variant 3D visualization system and key components

**Fig. 2** Two sets of six 3D maps based on unified model in different conditions; **a**1–**a**3 for the file CA; **b**1–**b**3 for the file RC4

the MM module as input data. The main function of the VM is to output the four vectors of variant measurements. Using unified or non-unified method, six probability measurements are created by PM module. In order to establish 3D maps, three vectors of probability measurements are selected from the six probability measurements by the SM module. Three vectors determine a 3D spatial position. All vectors generate a 3D map using 3DVM.

There are six parameters in an input group, three sets of parameters in the intermediate group, and one set of parameters in the output group.

#### **Input Group**:

An integer indicates the serial number of logic function or the value of the key selected

An integer indicates which model is selected An integer indicates the number of elements in the binary sequence An integer indicates the number of elements in a segment An integer indicates the method of selection mechanism An integer indicates the control parameter for mapping

#### **Intermediate Group**:

A 0-1 vector generated by CA logic function or RC4 keystream generator A set of four variant measures A set of six probability vectors

#### **Output Group**:

3D maps

## *2.2 Computation Model of CA (CMCA)*

CMCA module is used to measure the features of a logic function based on Cellular Automata (CA). Consider a logic function ƒ: *Y*=ƒ(*X*) as a function of CA, the output sequence *Y* can be generated by the given initial input sequence *X* with 2 states. For N bits initial input sequence, a total of 2*<sup>n</sup>* states are generated under the logic function ƒ: *X*→*Y*. A pair of vectors (*X*, *Y*) could be collected for their correspondences on the pair of input–output relationships. There are 2*<sup>n</sup>* groups of this corresponding relationship.

#### **Input Group**:

*<sup>X</sup>* A 0-1 vector with *<sup>N</sup>* elements, *<sup>X</sup>* <sup>∈</sup> *<sup>B</sup><sup>n</sup>* 2


#### **Intermediate Group**:

*<sup>Y</sup>* A 0-1 vector with *<sup>N</sup>* elements, *<sup>Y</sup>* <sup>∈</sup> *<sup>B</sup><sup>n</sup>* 2

#### **Output Group**:

∀*Y* Exhaustive set of all states of *N* bit vectors with 2*<sup>n</sup>* elements

## *2.3 Computation Model of RC4 Keystream (RC4KCM)*

For an *L* bits input keystream *K*, divided into *G* segments and *W L*/*G* bits of each segment with *G*<*L*. The value of parameter *G* determines the amount of points and *W* determines the spatial distribution for the output keystream in the phase space. **Input Group**:

A 0-1 vector with L elements generated by RC4 keystream generator


#### **Output Group**:

*G* sets of *W* bits 0-1 vectors

The CMRC4 component uses an input vector as input, under different segment strategies to divide into several segments. The output of this component is *G* sets of *W* bits 0-1 vectors.

## *2.4 Measure Mechanism (MM)*

The MM component shown in Fig. 1c is composed of three modules: Variant Measure (VM), Probability Measurement (PM), and Selection Mechanism (SM). Three parameters are listed as input signals; four variant measures are outputted from VM module, six probability measurements are created from variant measures by Probability Measurement (PM), under the Selection Mechanism (SM) module, and a set of triples interactive projections is selected.

#### **Input Group**:

*V* A symbol is selected from four types of transformations {⊥, +, −, T},

*N* An integer indicates the number of elements in an input vector

A 0-1 data vector

#### **Intermediate Group**:

*V M*- *R<sup>V</sup>* A set of four variant measures *P M*- *P<sup>V</sup>* A set of four probability vectors

#### **Output Group**:

*U* ⊂ *V* A set of three interactive projections under the SM condition, *U* ⊂ *V P M*- *P<sup>U</sup>* A set of three probability vectors

#### *2.5 Variant Measure (VM)*

Considering the transformation of every bit between input sequence {*Xi*}*N*−<sup>1</sup> *<sup>i</sup>*<sup>0</sup> and output sequence {*Yi*}*N*−<sup>1</sup> *<sup>i</sup>*<sup>0</sup> , there are a total of four types of transformations: 0→0, 0→1, 1→0, and 1→1 [10, 11].

Define the variant representation as follows:

$$V = \begin{cases} \bot, X\_i = 0, Y\_i = 0; \\ +, X\_i = 0, Y\_i = 1; \quad 0 \le i \le N, \quad X\_i, Y\_i \in \mathcal{B}\_2; \\ -, X\_i = 1, Y\_i = 0; \\ \text{T}, X\_i = 1, Y\_i = 1; \end{cases}$$

For any N bit 0-1 vector *X*, *X X*0*X*<sup>1</sup> ... *Xi* ... *X <sup>N</sup>*−1*X <sup>N</sup>* , 0 ≤ *i* ≤ *N*, *Xi* ∈ *<sup>B</sup>*2, *Xi* <sup>∈</sup> *<sup>B</sup><sup>N</sup>* <sup>2</sup> under 2-variable function ƒ, N bit 0-1 output vector *Y*, *Y <sup>Y</sup>*0*Y*<sup>1</sup> ... *Yi* ... *YN*−<sup>1</sup>*YN* , <sup>0</sup> <sup>≤</sup> *<sup>i</sup>* <sup>≤</sup> *<sup>N</sup>*, *Yi* <sup>∈</sup> *<sup>B</sup>*2, *Yi* <sup>∈</sup> *<sup>B</sup><sup>N</sup>* <sup>2</sup> . Let be the variant measure function.

$$\begin{aligned} \Delta(X \to Y) &= \sum\_{i=0}^{N-1} \Delta(X\_i \to Y\_i) = \langle R\_\perp, R\_\star, R\_-, R\_\mp \rangle, \ N = R\_\perp + R\_+ + R\_- + R\_\mp, R\_0 \\ &= R\_\perp + R\_+, R\_1 = R\_- + R\_\mp \end{aligned}$$

Example *N* 13, *Y*= ƒ (X).

$$\begin{aligned} \mathbf{X} &= 1001011100101 \\ \mathbf{Y} &= 0010110101100 \end{aligned}$$

$$\begin{aligned} \Delta(X \to Y) &= -\bot + - + \mathsf{\tau} - \mathsf{\tau} \bot + \mathsf{\tau} - \\ \langle R\_\perp + R\_+ + R\_-, R\_\mathbb{T} \rangle &= \langle 3, 3, 4, 3 \rangle, R\_0 = 6, R\_1 = 7, N = 13 \rangle \end{aligned}$$

Input and output pairs are 0-1 variables for only four combinations. For any given function, the quantitative relationship of {⊥, +, −, } is directly derived from the input/output sequences. Four meta measures are determined [12]. **Input Group**:

## *V* A symbol is selected from four types of transformations {⊥, +, −, T},

*N* An integer indicates the number of elements in an input vector

A 0-1 data vector

#### **Output Group**:


## *2.6 Probability Measurement (PM)*

Variant measure parameters and the other three parameters are listed as input signals; the output of probability signals is calculated as eight measurements in two groups by following the given equations.

The first group of probability signal vectors ρ is called a non-unified model and defined as follows:

$$\begin{cases} \rho = \frac{R^{\mathbb{V}}}{N} = R\_{\perp}, R\_{+}, R\_{-}, R\_{\mathsf{T}}\\ \rho\_{a} = \frac{R\_{a}}{N}, \alpha \in \{\perp, +, \\_\mathsf{T}\} \end{cases} \text{ & \begin{cases} \rho\_{0} = \frac{R\_{0}}{N} \\ \rho\_{1} = \frac{R\_{\mathsf{L}}}{N} \end{cases}$$

The second group of probability signal vectors ρ˜ is called a unified model and defined as follows:

$$\begin{cases} \tilde{\rho} = \frac{R^{\mathsf{V}}}{R\_{\mathsf{0}} | R\_{\mathsf{l}}} = R\_{\mathsf{L}}, \, R\_{\mathsf{+}}, \, R\_{-}, \, R\_{\mathsf{T}}\\ \qquad \rho\_{\alpha} = \frac{R\_{\alpha}}{R\_{0}}, \, \alpha \in \{\perp, +\} \\ \qquad \rho\_{\beta} = \frac{R\_{\beta}}{R\_{\mathsf{l}}}, \, \beta \in \{\perp, \mathsf{T}\} \end{cases} \quad \& \quad \begin{cases} \rho\_{0} = \frac{R\_{0}}{N} \\ \rho\_{1} = \frac{R\_{\mathsf{l}}}{N} \end{cases}$$

Under such condition, the output signals of the PM module can be expressed as a pair of probability vectors in quaternion forms *P M*- *P<sup>V</sup>* {ρ , ρ˜}. **Input Group**:


#### **Output Group**:

*P M*- *P<sup>V</sup>* A set of four probability vectors

#### *2.7 Selection Mechanism Module*

The SM Module is composed of two models: Non-unified Model and Unified Model. Under different constructions, two models are established respectively as follows.

#### **Non-unified Model**

Selecting two measurements from four combinations {ρ˜⊥, ρ˜+, ρ˜−, ρ˜T}, there will be C2 <sup>4</sup> choices. And then selecting one measurement from two combinations {ρ0, ρ1}, there will be *C*<sup>1</sup> <sup>2</sup> choices. A 3-tuple *S* is defined as follows:

$$\begin{cases} S = \left(\rho\_{\alpha}, \rho\_{\beta}, \rho\_{\gamma}\right) \\ S' = \left(\rho\_{\beta}, \rho\_{\alpha}, \rho\_{\gamma}\right), \quad \alpha, \ \beta \in V, \ \gamma \in \{0, 1\}, \ \alpha \neq \beta \\ S = S' \end{cases}$$

#### **Unified Model**

Selecting two measurements from four combinations {ρ˜⊥, ρ˜+, ρ˜−, ρ˜T}, there will be C2 <sup>4</sup> choices. And then selecting one measurement from two combinations {ρ0, ρ1}, there will be C<sup>2</sup> <sup>4</sup> choices. A 3-tuple *S*˜ is defined as follows:

$$\begin{cases} \tilde{S} = \left( \tilde{\rho}\_{\alpha}, \tilde{\rho}\_{\beta}, \tilde{\rho}\_{\gamma} \right) \\ \tilde{S}' = \left( \tilde{\rho}\_{\beta}, \tilde{\rho}\_{\alpha}, \tilde{\rho}\_{\gamma} \right), \quad \alpha, \ \beta \in V, \ \gamma \in \{0, 1\}, \ \alpha \neq \beta \\ \tilde{S} = \tilde{S}' \end{cases}$$

Under such condition, the output signals of the SM module can be expressed as a 3D visual model in 3-tuples forms *S* or *S*˜. Specifically ρα or ρ˜<sup>α</sup> determines the value of X-axis, ρβ or ρ˜<sup>β</sup> determines the value of Y-axis, and ργ or ρ˜<sup>γ</sup> determines the value of Z-axis.

#### **Input Group**:

*P M*- *P<sup>V</sup>* A set of four probability vectors

#### **Output Group**:

*U* ⊂ *V* A set of three interactive projections under the SM condition, *U* ⊂ *V P M*- *P<sup>U</sup>* A set of three probability vectors

#### *2.8 Visualization Model*

Using a visual model, *all possible measurements are calculated exhaustively on all G*-*1 vectors. Each 3*-*tuple* can be drawn as a point in three-dimensional space (*xyz*-space). All G-1 points are constructed in the phase space for the selected keys.

## **3 Sample Results on 3D Maps**

In this section, two types of data sets are selected to illustrate their differences on 3D maps for comparison. The first type of data sets is generated by CA. The second type of data sets is generated by RC4.

# *3.1 Visualization Results of Unified Model*

See Fig. 2.

## *3.2 Visualization Results of Non-unified Model*

See Fig. 3.

# *3.3 Visualization Results of CA with Different Length of Initial Sequence*

See Fig. 4.

## *3.4 Visualization Results of RC4 Keystream with Different Segment Strategies*

See Fig. 5.

## **4 Analysis of Results**

The above 27 3D maps contain different information. Some important conclusions will be discussed in detail in this section.

The first group of results shown in Fig. 2 presents two sets of six 3D maps constructed by the unified model from two data files: CA and RC4 to illustrate their 3D spatial characteristics. Three 3D maps of each group in Fig. 2a1–a3 show 3D spatial characteristics of CA with different logic functions. In this group, No. 23, 90, 253 functions are selected as examples to compare each other. And three 3D maps of each group in Fig. 2b1–b3 show 3D spatial characteristics of RC4 with 20

**Fig. 3** Two sets of six 3D maps based on non-unified model in different conditions; **a**1–**a**3 for the file CA; **b**1–**b**3 for the file RC4

**Fig. 4** Three sets of nine 3D maps under different conditions; **a**1–**a**2 for the logic function *f* 15 and non-unified model; **b**1–**b**2 for the logic function *f* 100 and non-unified model; **c**1–**c**2 for the logic function *f* 170 and non-unified model

bits of every segment and different given keys. In this group, keys: 12, 88, and 155 are selected as examples to compare each other. From a distribution viewpoint, different logic function can be distinguished by their three-dimensional spatial characteristics from CA files, e.g., (a1–a3). Different from CA, for RC4 keystream, all spatial distributions are always in a plane, e.g., (b1–b3).

The second group of results shown in Fig. 3 presents two sets of six 3D maps constructed by non-unified model. It is interesting to observe that all maps (no mater CA data files or RC4 keystream data files) have planar distribution, e.g., (a1–a3) and (b1–b3).

The third group of results shown in Fig. 4 presents three sets of six 3D maps constructed by non-unified model from CA data files with different lengths of the initial sequence and given logic functions. Figure 4a1–a2 shows 3D maps for the No. 15 function, (b1–b2) shows 3D maps for the No. 100 function, and (c1–c2) shows 3D maps for the No. 170 function. The overall relationship of multiple-variable logic functions for spatial characteristics can be shown clearly. For example, under the non-unified model, no matter what logic functions are, all spatial distributions are always in a plane, e.g., (a1–a2), (b1–b2), and (c1–c2). Different lengths of initial

**Fig. 5** Three sets of nine 3D maps under different conditions; **a**1–**a**3 for the key90 and unified model; **b**1–**b**3 for the key90 and non-unified model; **c**1–**c**3 for the key123 and non-unified model

*W=256: (a3) (b3) (c3)* 

sequence (*n* 12, 13) have different spatial characteristics distribution with the same given logic function, e.g., (a1–a2), (b1–b2) and (c1–c2).

The fourth group of results shown in Fig. 5 presents three sets of nine 3D maps for the different conditions including segments strategies and keys. In this group, three types of segment strategies (*W* 20, 128, 256) are proposed to compare. Combinations of three set use the same key e.g., (a1–a3), (b1–b3), and (c1–c3) to observe them conveniently. The dispersity of points increased with reducing the bit length of each segment. Obviously, the spatial distribution of points with 256 bits of each segment is more concentrated than the distribution of points with 20 bits, as shown in (a1–a2), (b1–b2), and (c1–c2). 3D map shows some commonalities of the spatial distribution of different keys and different segment strategies. First, under this construction, different keys can be distinguished by their three-dimensional spatial characteristics in the model, e.g., (b1–c1), (b2–c2), and (b3–c3). Second, no matter what keys or segment strategies are, all spatial distributions are always in a plane. Third, the distribution features are varying from key to key and segment strategy to segment strategy.

## **5 Conclusions**

Both the similarities and the differences may indicate those maps with comparable mechanism to express keystream with different given keys and in their high levels of relationships applying to the stream cipher mechanism. The spatial property of random sequence can be detected from the distribution of cluster point in the 3D maps discussed in details. Different spatial distributions are illustrated to show various distributions on each phase space for relevant logic function or keystream. For example, no matter what keys or segment strategies are, all spatial distributions are always in a pane. And all maps (no mater CA data files or RC4 keystream data files) are planar distribution under non-unified model. Spatial distribution properties like this provide useful information for further exploring the RC4 stream cipher. This construction could provide remarkable insights to spatial information on stream cipher construction via 3D maps. Further explorations are required on this scheme.

**Acknowledgements** Thanks to the school of software Yunnan University, to the key laboratory of Yunnan software engineering for excellent working environment, to the Yunnan Advanced Overseas Scholar Project (W8110305), to the Key R&D project of Yunnan Higher Education Bureau (K1059178), and to National Science Foundation of China (61362014) for the financial support to this project.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Part VI Applications—Quantum Simulations

The best way to understanding is a few good examples.

—Isaac Newton

The true logic of this world is in the calculus of probabilities. —James Clerk Maxwell

A deep truth is a truth so deep that not only is it true but it's exact opposite is also true.

—Niels Bohr

In the direction of quantum information, several papers were published in the period of 2011–2013. For example, Variant simulation system using quaternion structures, Journal of Modern Optics 59(5):484–492, 2012, "Chapter Interactive Maps on Variant Phase Spaces", Emerging Applications of Cellular Automata, https://doi. org/10.5772/51635, In Tech Press 2013. In the Afshar experiment, variant scheme has been cited, https://en.wikipedia.org/wiki/Afshar\_experiment.

This part of quantum simulation is composed of two chapters (16 and 17).

Chapter "Synchronous Property—Key Fact on Quantum Interferences" describes synchronous property in quantum interferences simulation on double path experiment.

Chapter "The nth Root of NOT Operators of Quantum Computers" proposes a typical operator on the nth root of NOT operators as an algebraic solution.

# **Synchronous Property—Key Fact on Quantum Interferences**

# **Particle Simulation on Double Path Experiment**

**Jeffrey Zheng**

**Abstract** Double-slit experiment plays a key role in Quantum Theory to distinct particle and wave interactions according to Feynman's claims. In this chapter, double path model and variant logic principle are applied to establish a simulation system for exhaustive testing targets. Using Einstein quanta interaction, different measure quaternion structures are investigated. Under Symmetry/Anti-symmetry and Synchronous/Asynchronous interaction conditions, eight groups of statistical results are generated as eight histograms to show their distributions. From this set of simulation results, it can be recognized that the synchronous condition is the key fact to generate quantum wave interference patterns and, in addition, the asynchronous condition is the key fact to make classic particle distributions. Sample results are illustrated and explanations are discussed.

**Keywords** Double path · Interaction · Probability · Statistics · Simulation

# **1 Introduction**

Feynman explored quantum measurement puzzles deeply [1, 2] and emphasized: "The entire mystery of quantum mechanics is in the double-slit experiment." This experiment directly illustrates both classical and quantum interactive results. Under single and double slit conditions, dual visual distributions are shown in particle and wave statistical distributions linked to von Neumann's measure theory [3].

J. Zheng (B)

Key Laboratory of Quantum Information of Yunnan, Yunnan University, Kunming, China e-mail: conjugatelogic@yahoo.com

J. Zheng

This work was supported by the Key Project on Electric Information and Next Generation IT Technology of Yunnan (2018ZI002), and Yunnan Advanced Overseas Scholar Project.

Key Laboratory of Software Engineering of Yunnan, Yunnan University, Kunming, China

J. Zheng (ed.), *Variant Construction from Theoretical Foundation to Applications*, https://doi.org/10.1007/978-981-13-2282-2\_16

From the 1970s, piloted by CHSH [4], Aspect used experiments to test Bell inequalities [5–7]. After 40 years of development, many accurate experiments [8–10] have been performed successfully worldwide using Laser, NMRI, large molecular, quantum coding, and quantum communication approaches [5–8, 11–26].

In this chapter, a double path model is established using the Mach–Zehnder interferometer. Different approaches of quantum measures: Einstein, CHSH, and Aspect are investigated by quaternion structures. Under multiple-variable logic functions and variant principle, logic functions can be transferred into variant logic expression as variant measures. Under such conditions, a variant simulation model is proposed. A given logic function *f* can be represented as two meta-logic functions *f*<sup>+</sup> and *f*<sup>−</sup> to simulate single and double path conditions. *N* bits of input vectors are exhausted by 2*<sup>N</sup>* states for measured data, recursive data are organized into eight histograms. Results are determined by symmetry/anti-symmetry properties evident in these histograms. Both results are obtained consistently from this model on synchronous/asynchronous conditions. Based on this set of simulation results, synchronous condition shows significant relationship linked to interference properties.

#### **2 Double Path Model and Their Measures**

#### *2.1 Mach–Zehnder Interferometer Model*

The Mach–Zehnder interferometer is the most popular device [6, 20] to support Young's double-slit experiment.

In Fig. 1a, a double path interferometer is shown. An input signal *X* under control function *f* causes Laser LS to emit the output signal ρ under BP (Bi-polarized filter) operation output a pair of signals: ρ<sup>+</sup> and ρ−. Both signals are processed by SW output ρ<sup>+</sup> *<sup>L</sup>* and ρ<sup>−</sup> *<sup>R</sup>* , and then IM to generate output signals IM(ρ<sup>+</sup> *<sup>L</sup>* , ρ<sup>−</sup> *<sup>R</sup>* ). In Fig. 1b, a representation model has been described with the same signals being used.

**Fig. 1** Double path model **a** Mach–Zehnder double path model, **b** Description model

## *2.2 Emission and Absorption Measures of Quantum Interaction*

Einstein established a model to describe atomic interaction [27–30] with radiation in 1916. For two-state systems, let a system have two energy states: the ground state *E*<sup>1</sup> and the excited state *E*2. Let *N*<sup>1</sup> and *N*<sup>2</sup> be the average numbers of atoms in the ground and excited states, respectively. The numbers of states are changed from an emission state *E*<sup>2</sup> to *E*<sup>1</sup> with a rate <sup>d</sup>*N*<sup>21</sup> <sup>d</sup>*<sup>t</sup>* , in the same time; the numbers of ground states are determined by absorbed energies from *E*<sup>1</sup> to *E*<sup>2</sup> with a rate <sup>d</sup>*N*<sup>12</sup> <sup>d</sup>*<sup>t</sup>* , respectively. Let *N*<sup>12</sup> be the number of atoms from *E*<sup>1</sup> to *E*<sup>2</sup> and *N*<sup>21</sup> be the numbers from *E*<sup>2</sup> to *E*1. In Einstein's model, a measurement quaternion is -*N*1, *N*2, *N*12, *N*21.

CHSH proposed spin measures testing Bell inequalities [4, 6]. They applied ⊥ → + and ||→ − to establish a measurement quaternion

$$\langle N\_{++}(a,b), N\_{+-}(a,b), N\_{-+}(a,b), N\_{--}(a,b) \rangle .$$

Experimental testing of Bell inequalities was performed by Aspect [5] in 1982. Four parameters are measured: transmission rate *Nt* , reflection rate *Nr*, correspondent rate *Nc*, and the total number *N*<sup>ω</sup> inω time period. This set of measures is a quaternion -*Nt*, *Nr*, *Nc*, *N*ω. Among these, *Nc* is a new data type not in Einstein and CHSH methods, this parameter could be an extension of synchronous/asynchronous timemeasurement.

#### **3 Simulation Systems**

#### *3.1 Simulation Model*

Using variant principle described in the next subsections, a *N* bit 0-1 vector *X* and a given logic function *f* , all *N* bit vectors are exhausted, variant measures generate two groups of histograms. This variant simulation system is composed of three components: Pre-process, Interaction, and Post-process, respectively, and shown in Fig. 2.

In Fig. 2a, three components of the variant simulation model are presented. At the pre-process, a *N* bit 0-1 vector X and a function *f* feed in to output a signal ρ. After interactive component process, two groups of signals are the output: *u* for symmetry group and v for anti-symmetry group. In the post-process, all *N* bit vectors are processed by pre-processing and interactive components until all of the 2*<sup>N</sup>* data set has been processed to transform symmetry and anti-symmetry signals into eight histograms: four for symmetry distributions and another four for anti-symmetry distributions.

In Fig. 2b, only the interaction component is selected, input signal ρ processed by BP to generate two signals {ρ−, ρ+}. SW output triple signals {ρ−, 1 − ρ−, ρ+} though IM to generate two groups of signals *u* and v.

#### *3.2 Variant Principle*

The variant principle is based on n-variable logic functions [31–33]. For any *n*variables, *x xn*−<sup>1</sup> ... *xi* ... *x*0, 0 ≤ *i* < *n*,*xi* ∈ {0, 1} *B*2. Let a position *j* be the selected bit 0 ≤ *j* < *n*, *x <sup>j</sup>* ∈ *B*<sup>2</sup> be the selected variable. Let output variable y and *<sup>n</sup>*-variable function *<sup>f</sup>*, *<sup>y</sup> <sup>f</sup>* (*x*), *<sup>y</sup>* <sup>∈</sup> *<sup>B</sup>*2, *<sup>x</sup>* <sup>∈</sup> *<sup>B</sup><sup>n</sup>* <sup>2</sup> . For all states of *x*, a set *S*(*n*) composed of the 2*<sup>n</sup>* states can be divided into two sets: *S <sup>j</sup>* <sup>0</sup> (*n*) and *<sup>S</sup> <sup>j</sup>* <sup>1</sup> (*n*).

$$\begin{cases} \mathcal{S}\_0^j(n) = \left\{ \mathbf{x} | \mathbf{x}\_j = \mathbf{0}, \forall \mathbf{x} \in \mathcal{B}\_2^n \right\} \\\\ \mathcal{S}\_1^j(n) = \left\{ \mathbf{x} | \mathbf{x}\_j = 1, \forall \mathbf{x} \in \mathcal{B}\_2^n \right\} \\\\ \mathcal{S}(n) = \left\{ \mathcal{S}\_0^j(n), \mathcal{S}\_1^j(n) \right\} \end{cases}$$

For a given logic function *f* , there are input and output pair relationships to define four meta-logic functions { *f*⊥, *f*+, *f*−, *fT* }:

$$\begin{cases} f\_{\perp}(\mathbf{x}) = \left\{ f(\mathbf{x}) | \mathbf{x} \in S\_0^j(n), \, \mathbf{y} = \mathbf{0} \right\} \\ f\_{+}(\mathbf{x}) = \left\{ f(\mathbf{x}) | \mathbf{x} \in S\_0^j(n), \, \mathbf{y} = 1 \right\} \\ f\_{-}(\mathbf{x}) = \left\{ f(\mathbf{x}) | \mathbf{x} \in S\_1^j(n), \, \mathbf{y} = \mathbf{0} \right\} \\ f\_{\mathbf{f}}(\mathbf{x}) = \left\{ f(\mathbf{x}) | \mathbf{x} \in S\_1^j(n), \, \mathbf{y} = 1 \right\} \end{cases}$$

Two logic canonical expressions: AND-OR form is selected by { *f*+(*x*), *fT* (*x*)} as y1 items, and OR-AND form is selected from { *f*−(*x*), *f*⊥(*x*)} as y0 items. Considering { *fT* (*x*), *f*⊥(*x*)}, *x <sup>j</sup> y* items, they are invariant themselves.

To select { *f*+(*x*), *f*−(*x*)}; *x <sup>j</sup> y* forming variant logic expression. Let *f* (*x*) *f*+|*x*| *f*− be a variant logic expression. Any logic function can be expressed as a variant logic form. In *<sup>f</sup>*+|*x*<sup>|</sup> *<sup>f</sup>*− structure, *<sup>f</sup>*<sup>+</sup> selected 1 item in *<sup>S</sup> <sup>j</sup>* <sup>0</sup> (*n*) as same as the AND-OR standard expression, and *f*<sup>−</sup> selecting relevant parts as same as the OR-AND expression 0 items in *S <sup>j</sup>* <sup>1</sup> (*n*). For a convenient understanding of variant representation, two-variable logic structures are illustrated for its 16 functions shown in Table 1.

For example, checking two functions *f* 3 and *f* 12:

$$\begin{aligned} \{f = 3 := \langle 0 \mid 3 \rangle, f\_+ &= 11 := \langle 0 \mid \phi \rangle, f\_- = 2 := \langle \phi \mid 3 \rangle\} \\ \{f = 12 := \langle 2 \mid 1 \rangle, f\_+ &= 14 := \langle 2 \mid \phi \rangle, f\_- = 8 := \langle \phi \mid 1 \rangle\} \end{aligned}$$

#### *3.3 Variant Measures*

Let be variant measure function [23, 33].

$$
\Delta = \langle \Delta\_{\perp}, \Delta\_{\star}, \Delta\_{-}, \Delta\_{\mathcal{T}} \rangle
$$

$$\begin{aligned} \Delta f(\mathbf{x}) &= \langle \Delta\_{\perp} f(\mathbf{x}), \Delta\_{\star} f(\mathbf{x}), \Delta\_{-} f(\mathbf{x}), \Delta\_{T} f(\mathbf{x}) \rangle \\ &= \langle \Delta f\_{\perp}(\mathbf{x}), \Delta f\_{\star}(\mathbf{x}), \Delta f\_{-}(\mathbf{x}), \Delta f\_{T}(\mathbf{x}) \rangle \end{aligned}$$

$$\Delta f\_a(\mathbf{x}) = \begin{cases} 1, \text{ if } f(\mathbf{x}) = f\_a(\mathbf{x}), \alpha \in \{\perp, +, -, \mathbf{T}\}, \\ 0, \text{ otherwise} \end{cases}$$

For any given n-variable state there is one position in *f* (*x*) to be 1 and other three positions are 0.

For any *N* bit 0-1 vector *X*; *X X <sup>N</sup>*−<sup>1</sup> ... *XJ* ... *X*0, 0 ≤ *J* < *N*, *X*<sup>J</sup> ∈ β2, *X* ∈ β *N* <sup>2</sup> under n-variable function *f* , n bit 0-1 output vector *Y*, *Y f* (*X*) *f*+|*X*| *f*−, *<sup>Y</sup> YN*−<sup>1</sup> ... *<sup>Y</sup>*<sup>J</sup> ... *<sup>Y</sup>*0, <sup>0</sup> <sup>≤</sup> <sup>J</sup> <sup>&</sup>lt; *<sup>N</sup>*, *<sup>Y</sup>*<sup>j</sup> <sup>∈</sup> <sup>β</sup>2, *<sup>Y</sup>* <sup>∈</sup> <sup>β</sup> *<sup>N</sup>* 2 .

For the *<sup>J</sup>*th position, be *<sup>x</sup>*<sup>J</sup> [... *<sup>X</sup>*<sup>J</sup> ...] <sup>∈</sup> <sup>β</sup>*<sup>n</sup>* <sup>2</sup> to form *Y*<sup>J</sup> *f* (*x*<sup>J</sup> ) *f*+|*x*<sup>J</sup> | *f*<sup>−</sup> , let *N* bit positions be cyclic linked. Variant measures of *f* (*X*) can be decomposed

$$\Delta \langle X : Y \rangle = \Delta f(X) = \sum\_{J=0}^{N-1} \Delta f(\mathbf{x}^J) = \langle N\_\perp, N\_+, N\_-, N\_T \rangle\_J$$

as a quaternion -*N*⊥, *N*+, *N*−, *NT* .

For example, *N* 10, given *f* , *Y f* (X).


**Table 1** Two variable logic functions and variable logic representation (*n* 2, *j* 0)

X 01 10 01 11 00 Y 10 10 10 10 10 (X : Y) + − T ⊥ + − T − + ⊥

$$
\Delta f(X) = \langle N\_\perp, N\_+, N\_-, N\_T \rangle = \langle \mathcal{Z}, \mathfrak{Z}, \mathfrak{Z}, \mathfrak{Z} \rangle, N = 10
$$

Input and output pairs are 0-1 variables on the four combinations. For any given function *f* , the quantitative relationship of {⊥, +, −, *T* } is determined directly from input/output sequences.

#### *3.4 Measurement Equations*

Using variant quaternion, signals are calculated by following equations. For any *N* bit 0-1 vector *X*, function *f* , under measurement: *f* (*x*) -*N*⊥, *N*+, *N*−, *NT* , *N N*<sup>⊥</sup> + *N*<sup>+</sup> + *N*<sup>−</sup> + *NT* Signal ρ is defined by

$$\rho = \frac{\Delta f(\mathbf{x})}{N} = \langle \rho\_\perp, \rho\_\star, \rho\_-, \rho\_T \rangle$$

$$
\rho\_a = \frac{N\_a}{N}, \ 0 \le \rho\_a \le 1, \quad a \in \{\perp, +, -, T\}
$$

Using {ρ+, ρ−}, a pair of signals {*u*, v} are formulated:

$$\begin{cases} \boldsymbol{\mu} = \langle \mu\_0, \boldsymbol{\mu}\_+, \boldsymbol{\mu}\_-, \boldsymbol{\mu}\_1 \rangle = \{ \boldsymbol{\mu}\_\beta \}, \\\boldsymbol{v} = \langle v\_0, v\_+, v\_-, v\_1 \rangle = \{ v\_\beta \} \end{cases}$$

β ∈ {0, +, −, 1}

$$\begin{cases} u\_0 = \rho\_- \oplus \rho\_+ \\ v\_0 = (1 - \rho\_-)/2 \oplus (1 + \rho\_+)/2 \\ u\_+ = \rho\_+ \\ v\_+ = (1 + \rho\_+)/2 \\ u\_- = \rho\_- \\ v\_- = (1 - \rho\_-)/2 \\ u\_1 = \rho\_- + \rho\_+ \\ v\_1 = (1 - \rho\_- + \rho\_+)/2 \end{cases}$$

where 0 ≤ *u*<sup>β</sup> , vβ ≤ 1, β ∈ {0, +, −, 1}, ⊕: Asynchronous addition, +: Synchronous addition.

Using {*u*, v} signals, each *u*<sup>β</sup> (v<sup>β</sup> ) determines a fixed position in the relevant histogram to make vector *X* on a position. After complete 2*<sup>N</sup>* data sequences, eight symmetry/anti-symmetry histograms of *H*(*u*β| *f* ) *H*(vβ| *f* ) β ∈ {0, +, −, 1} are generated.

#### **4 Simulation Results**

The simulation provides a series of output results. In this section, two cases are selected: *N* {12, 13}, *n* 2, *j* 0, { *f* 3, *f*<sup>+</sup> 11, *f*<sup>−</sup> 2}, and { *f* 12, *f*<sup>+</sup> 14, *f*<sup>−</sup> 8}. Corresponding to double path, right path, left path, symmetric and nonsymmetric conditions, respectively. For the convenience of comparison, sample cases are shown in Fig. 3a–c. In Fig. 3a, representation patterns are illustrated. Figure 3b represents *f* 3 conditions and Fig. 3c represents *f* 12 conditions, respectively. Eight histograms of *H*(*u*+| *f* ) *H*(*u*−| *f* ) are shown with results represented by symmetric meta-functions in four groups.

#### **5 Analysis of Results**

#### *5.1 Visual Distributions*

In *H*(*u*+| *f* ) *H*(*u*−| *f* ) conditions, { *H*(*u*1| *f* ), *H*(v1| *f* )} have significant interference patterns different from other conditions. Output results are balanced.

#### *5.2 Particle Statistical Distributions*

For all symmetric or nonsymmetric cases under ⊕ asynchronous addition operations, relevant values meet 0 ≤ *u*0, v0, *u*−, v−, *u*+, v<sup>+</sup> ≤ 1.

Checking { *H*(*u*0| *f* ), *H*(v0| *f* )} series, {*H*(*u*+| *f* ), *H*(*u*−| *f* )} and { *H*(v+| *f* ), *H*(v−| *f* )} satisfy the following equation:

$$\begin{cases} H(\mu\_0|f) = H(\mu\_-|f) + H(\mu\_+|f) \\ H(v\_0|f) = H(v\_-|f) + H(v\_+|f) \end{cases}$$

The equation is true even *N* and *n* in different values.

**Fig. 3** Results of symmetric meta distributions

**Fig. 3** (continued)

#### *5.3 Wave Interference Patterns*

Different interference properties are observed clearly in *H*(*u*+| *f* ) *H*(*u*−| *f* ) and *H*(v+| *f* ) *H*(1 − v−| *f* ) conditions. Under + synchronous addition operations, relevant values meet 0 ≤ *u*1, v1, *u*−, v−, *u*+, v<sup>+</sup> ≤ 1.

Checking { *H*(*u*1| *f* ), *H*(v1| *f* )} distributions especially in Fig. 3b–c {*u*1, v<sup>1</sup> } cases extremely strong interferences appeared and compared with { *H*(*u*+| *f* ), *H*(*u*−| *f* )} and { *H*(v+| *f* ), *H*(v−| *f* )}, there are significant differences. Spectra in different cases illustrate wave interference properties. From listed histogram distributions, they are all satisfied

$$\begin{cases} H(u\_1|f) \ne H(u\_-|f) + H(u\_+|f) = H(u\_0|f) \\ H(v\_1|f) \ne H(v\_-|f) + H(v\_+|f) = H(v\_0|f) \end{cases}$$

Single and double peaks are shown in interference patterns as classical double-slit distributions.

#### *5.4 Quaternion Measures*

It is interesting to see the relationship between the variant quaternion and other measures.

In the variant quaternion, *f* (*x*) -*N*⊥, *N*+, *N*−, *NT* , *N N*<sup>⊥</sup> + *N*<sup>+</sup> + *N*<sup>−</sup> + *NT* .

In Einstein's two-state system of interaction -*N*1, *N*2, *N*12, *N*21 allows the following equations to be established:

$$\begin{cases} N\_1 = N\_\perp + N\_+ \\ N\_2 = N\_- + N\_T \\ N\_{12} = N\_+ \\ N\_{21} = N\_- \\ N = N\_1 + N\_2 \end{cases}$$

From the equations, the measured pair { *N*21, *N*<sup>12</sup> } has a 1-1 correspondence to { *N*−, *N*<sup>+</sup> }.

Selecting + → 1, − → 0, CHSHs *N*±,∓(*a*, *b*) measures meet

$$\begin{cases} N\_{+,+}(a,b) \to N\_T \\ N\_{+,-}(a,b) \to N\_- \\ N\_{-,+}(a,b) \to N\_+ \\ N\_{-,-}(a,b) \to N\_\perp \end{cases}$$

$$(N\_{++}, N\_{+-}, N\_{-+}, N\_{--}) \rightarrow (N\_T, N\_-, N\_+, N\_\perp),$$

Let *N N*++ + *N*<sup>+</sup><sup>−</sup> + *N*−<sup>+</sup> + *N*−−, CHSH quaternion is a permutation of the variant quaternion.

Aspect's quaternion (*Nt*, *Nr*, *Nc*, *N*ω) have the following corresponding:

$$\begin{cases} N\_t \to N\_- \\ N\_r \to N\_+ \\ N\_\oplus \to N \end{cases}$$

There is no parameter in the variant quaternion for the parameter *Nc*. It indicates joined action numbers to distinguish single and double paths, corresponding to {*u*0, v<sup>0</sup> } and {*u*1, v<sup>1</sup> } times. In an actual experiment, this parameter is significant. In a simulated system, the parameter is a control coefficient that separates two types of measured paths {*u*0, v<sup>0</sup> } and {*u*1, v<sup>1</sup> } in the integration of comparisons on real experiments.

#### **6 Conclusions**

Analyzing *N* bit 0-1 vector and its exhaustive sequences for variant measurement, this system simulates double path interference properties through different accurate distributions. Using this model, two groups of parameters {*u*<sup>β</sup> } and {vβ } describe the left path, right path, double paths for particle, and double path for wave with distinguished symmetry and anti-symmetry properties.

Only synchronous conditions, double path system provides wave-like interference patterns different from classical ones.

Compared with the variant quaternion and other quaternion structures, it is helpful to understand possible properties of usages and limitations for variant simulation systems.

The complexity of n-variable function space has a size of 2<sup>2</sup>*<sup>n</sup>* . Whole simulation complexity is determined by *O*(2<sup>2</sup>*<sup>n</sup>* × 2*<sup>N</sup>* ) as ultra exponent productions. How to overcome the limitations imposed by such complexity and how best to compare and contrast such simulations with real-world experimentation will be key issues in future work.

**Acknowledgements** Thanks to Mr. Colin W Campbell for making English edition, Mr. Jie Wan for generating the simulation data, and Mr. Qingping Li for making the statistical histograms.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **The** *n***th Root of NOT Operators of Quantum Computers**

**Jeffrey Zheng**

**Abstract** This chapter proposes a novel approach to resolve the *n*th root of NOT problem for quantum computers using (−1, 0, 1) permutation matrices. Only logic NOT and exchange operations are required. This result provides a complete solution to design and implement the *n*th root of NOT operators of quantum computers.

**Keywords** Quantum simulator · Quantum computation · Square root of NOT *<sup>n</sup>*-th root of NOT · Permutation matrix · Quantum logic gate

## **1 Introduction**

Feynman [1] first proposed 'universal quantum simulator' towards a true quantum computer. Since then, research and development activities of quantum computation and quantum computers have become the new frontal of next-generation computers for two decades [2, 3]. Classical quantum mechanics use complex number vectors in Hilbert space to represent quantum states [4]. Any complex number is composed of two parts: a real part and an imaginary part. The imaginary number *i* - √−1 plays the essential role in the quantum mechanics construction. However, the mystery of the imaginary number causes severe difficulties for its manipulation, imagination and understanding [4–6]. Considering that modern computers are constructed by Boolean logic principles, how traditional logic structure is used to implement √−1 has been puzzling and deeply entangled in quantum computing for at least two decades [7–10]. Nothing in the published literature has described a way to implement this untamed operator using traditional logic operations [2, 11, 12].

J. Zheng (B)

© The Author(s) 2019

This work was supported by the Key Project on Electric Information and Next Generation IT Technology of Yunnan (2018ZI002), NSF of China (61362014), Yunnan Advanced Overseas Scholar Project.

Key Laboratory of Quantum Information of Yunnan, Yunnan University, Kunming, China e-mail: conjugatelogic@yahoo.com

J. Zheng (ed.), *Variant Construction from Theoretical Foundation to Applications*, https://doi.org/10.1007/978-981-13-2282-2\_17

#### *1.1 The Square Root of NOT Problem*

Following traditional logic, negation corresponds to logic NOT (¬). Initiated by Feynman [1] and further developed by Deutsch [9, 13], this problem has been represented as √¬ 'the Square Root of NOT' as one of the most difficult issues in quantum computation especially in general quantum gates. They suggested resolving ¬ - -0 1 1 0 equation using logic operations for the solution. Maglicki and Wang [11] provided an example of how to resolve the problem this way.

Let ¬ operation reverse two quantum spin states |0 - -0 1 , |1 - -1 0 ,

$$\begin{aligned} \neg|0\rangle &= \begin{pmatrix} 0 \ 1 \\ 1 \ 0 \end{pmatrix} \begin{pmatrix} 0 \\ 1 \end{pmatrix} = \begin{pmatrix} 1 \\ 0 \end{pmatrix} = |1\rangle \\ \neg|1\rangle &= \begin{pmatrix} 0 \ 1 \\ 1 \ 0 \end{pmatrix} \begin{pmatrix} 1 \\ 0 \end{pmatrix} = \begin{pmatrix} 0 \\ 1 \end{pmatrix} = |0\rangle \end{aligned}$$

To apply unitary rotational matrices, √¬ operator can be expressed as

$$\sqrt{-} = \frac{1}{\sqrt{2}} \begin{pmatrix} \mathbf{e}^{i\pi/4} & \mathbf{e}^{-i\pi/4} \\ \mathbf{e}^{-i\pi/4} & \mathbf{e}^{i\pi/4} \end{pmatrix} = \frac{1}{2} \begin{pmatrix} 1+i & 1-i \\ 1-i & 1+i \end{pmatrix}.$$

In the equations, both e*<sup>i</sup>*<sup>π</sup> and *i* symbols are involved. From a representative viewpoint, equations are useless because the symbols *<sup>i</sup>* and √¬ are both logic equivalent. The equations are in circular definitions.

To explore how to use traditional logic implementing √¬, it is necessary to analyse what has been established at the foundation levels of modern complex number construction.

#### *1.2 Complex Number in History*

The origin and development of complex number has a long and mysterious history [14–16]. In the nineteenth century, Gauss and Euler [15] made their foundation contributions to formally identifying imaginary parts as the most essential components to resolve solutions from *n*th algebraic equations. After their work, the imaginary number has been gradually accepted by mainstream mathematicians to be one of the most important parts of mathematics [15]. Hamilton established consistent operations on complex number in 1837 [17]. He constructed a complex number *a* + *bi* as an ordered number pair (*a*, *b*).

For example, let *a* + *bi* and *c* + *di* be two complex numbers. Four essential operations: {±, •, /} can be expressed as

$$\begin{aligned} (a,b) \pm (c,d) &= (a \pm c, b \pm d) \\ (a,b) \bullet (c,d) &= (ac-bd, ad+bc) \\ \frac{(a,b)}{(c,d)} &= \left(\frac{ac+bd}{c^2+d^2}, \frac{bc-ad}{c^2+d^2}\right) \end{aligned}$$

Using ordered pair representation, complex number operations are firmly established on real number operations. No further mysterious characteristics of imaginary numbers remain in the equations because all operations are well defined in real number construction.

#### **2 Solution of the Square Root of NOT Problem**

If we apply an imaginary number to an ordered pair, we have

$$i: (a, b) \to (-b, a)$$

When we do not restrict √¬ solution in {0, 1} field but extend the field to {−1, 0, 1}. A permutation matrix can be constructed.

Let

$$I\_2 = \begin{pmatrix} 1 \ 0 \\ 0 \ 1 \end{pmatrix}, I\_2^+ = \begin{pmatrix} 1 & 0 \\ 0 & -1 \end{pmatrix}, I\_2^- = \begin{pmatrix} -1 \ 0 \\ 0 \ 1 \end{pmatrix}, \quad Z\_2 = \begin{pmatrix} 0 & 1 \\ -1 \ 0 \end{pmatrix}, Z\_2^\perp = \begin{pmatrix} 0 & -1 \\ 1 & 0 \end{pmatrix},$$

$$Z\_2: (a, b) \to (-b, a)$$

$$(-b, a) = (a, b) \begin{pmatrix} 0 & 1 \\ -1 & 0 \end{pmatrix}$$

Because *Z*<sup>2</sup> provides the same result as the imaginary number when applied to the pair, it is necessary for us to explore *Z*<sup>2</sup> features in details.

Two eigenvalues of *Z*<sup>2</sup> can be determined from its determinant.

$$\begin{aligned} |\lambda I\_2 - Z\_2| &= \begin{vmatrix} \lambda & -1 \\ 1 & \lambda \end{vmatrix} = 0 \\ \lambda^2 + 1 &= 0, \quad \lambda^2 = -1, \quad \lambda = \pm\sqrt{-1} \end{aligned}$$

This corresponds to either *i* 0 0 −*i* or −*i* 0 0 *i* as the solution. There are two unitary matrices *U*+, *U*<sup>−</sup> and two Hermite conjugate matrices *U*<sup>∗</sup> <sup>+</sup> , *U*<sup>∗</sup> <sup>−</sup> undertaken similarity transformation on *Z*<sup>2</sup> to produce the two diagonal matrices:

$$\begin{aligned} iI\_2^\pm &= \begin{pmatrix} i & 0 \\ 0 & -i \end{pmatrix} = U\_+ \begin{pmatrix} 0 & 1 \\ -1 & 0 \end{pmatrix} U\_+^\*; \\\ iI\_2^\mp &= \begin{pmatrix} -i & 0 \\ 0 & i \end{pmatrix} = U\_- \begin{pmatrix} 0 & 1 \\ -1 & 0 \end{pmatrix} U\_-^\*. \end{aligned}$$

Although three matrices belong to one matrix group under similarity transformation, five matrices can be distinguished without any direct equality.

$$iI\_2 \neq iI\_2^{\pm} \neq Z\_2 \neq iI\_2^{\mp} \neq -iI\_2$$

To apply the five matrices twice separately, they all equal to −*I*2.

$$\begin{aligned} \left(\pm i I\_2\right)^2 &= \begin{pmatrix} \pm i & 0 \\ 0 & \pm i \end{pmatrix} \begin{pmatrix} \pm i & 0 \\ 0 & \pm i \end{pmatrix} = \begin{pmatrix} -1 & 0 \\ 0 & -1 \end{pmatrix} = -I\_2 \\\ \left(i I\_2^{\pm}\right)^2 &= \left(i I\_2^{\mp}\right)^2 = \begin{pmatrix} \pm i & 0 \\ 0 & \mp i \end{pmatrix} \begin{pmatrix} \pm i & 0 \\ 0 & \mp i \end{pmatrix} = \begin{pmatrix} -1 & 0 \\ 0 & -1 \end{pmatrix} = -I\_2 \end{aligned}$$

and

$$Z\_2^2 = \begin{pmatrix} 0 & 1 \\ -1 & 0 \end{pmatrix} \begin{pmatrix} 0 & 1 \\ -1 & 0 \end{pmatrix} = \begin{pmatrix} -1 & 0 \\ 0 & -1 \end{pmatrix} = -I\_2$$

Therefore, the *Z*<sup>2</sup> matrix is an equivalent form of the imaginary number under the transformation.

For any ordered pair (*a*, *b*),

$$\begin{aligned} (\mathbf{Z}\_2)^2 &: (a, b) \to (-a, -b) \\ (\mathbf{Z}\_2)^2 &: (a, b) \xrightarrow{\mathbf{Z}\_2} (-b, a) \xrightarrow{\mathbf{Z}\_2} (-a, -b) \\ (\mathbf{Z}\_2)^2 &= -I\_2 \\ \mathbf{Z}\_2 &= \sqrt{-I\_2} \end{aligned}$$

So, √¬ operation can be constructed originally from one-one correspondences from the *Z*<sup>2</sup> matrix.

Let *x*| be a quantum state, *x*| - *x*¯|. For a non-zero element of *Z*2, two values {−1, 1} of the elements map ⎧ ⎨ ⎩ −1 : *<sup>x</sup>*<sup>|</sup> <sup>¬</sup> −→ *x*¯| 1 : *x*| → *x*| then a √¬ operator is generated

from a *Z*<sup>2</sup> operator.

For an ordered state pair ( *x*|, *y*|),

$$(\langle \mathbf{x} \vert, \langle \mathbf{y} \vert) \stackrel{\sqrt{\square}}{\longrightarrow} (\langle \tilde{\mathbf{y}} \vert, \langle \mathbf{x} \vert) \stackrel{\sqrt{\square}}{\longrightarrow} (\langle \tilde{\mathbf{x}} \vert, \langle \tilde{\mathbf{y}} \vert) = \neg(\langle \mathbf{x} \vert, \langle \mathbf{y} \vert))$$

Therefore, *<sup>Z</sup>*<sup>2</sup> is a homologous form of the √¬ operator.

Under this construction, the square root of NOT problem in quantum computation is solved entirely. Only two elementary operations are involved in the transformation: logic **NOT**operation and pair–state exchange, respectively. They can be implemented readily using traditional logic constructions.

#### **3 General Solution of the** *n***th Root of NOT Operation**

In this part, a general solution of <sup>√</sup>*<sup>n</sup>* <sup>¬</sup> 'the *<sup>n</sup>*th root of NOT' for quantum computers is explored.

Let *Jn* denote a conjugate permutation matrix which contains *n* columns and *n* rows and each row (column) has one non-zero element.

$$J\_n = \left(J\_{i,j}\right), \ 1 = \sum\_{i=1}^n \left|J\_{i,j}\right| = \sum\_{i=1}^n \left|J\_{i,j}\right|, \ J\_{i,j} \in \{-1, 0, 1\}, i, j \in [1, n].$$

Let *In* be a unit matrix, *Ii*,*<sup>j</sup>* - 1,*i j*; *Ii*,*<sup>j</sup>* - 0,*i j*,*i*, *j* ∈ [1, *n*].

$$\text{For example, matrices } \begin{pmatrix} 1 & 0 & 0 \\ 0 & -1 & 0 \\ 0 & 0 & 1 \end{pmatrix}, \begin{pmatrix} 0 & -1 & 0 \\ 1 & 0 & 0 \\ 0 & 0 & 1 \end{pmatrix}, \begin{pmatrix} 0 & 0 & -1 \\ 0 & -1 & 0 \\ 1 & 0 & 0 \end{pmatrix}, I\_3 = \begin{pmatrix} 1 \ 0 \ 0 \\ 0 \ 1 \ 0 \\ 0 \ 0 \ 1 \end{pmatrix}$$

are *Jn* matrices.

Let *Pn* be a (0, 1)-permutation matrix in which each column (row) contains only one element, and *P S*(*n*) denote a permutation space containing all *Pn* matrices.

Let *J S*(*n*) denote a conjugate permutation space.

**Lemma** *For a given n, P S*(*n*) *contains a total number of n*! *distinguishable matrices, that is,* |*P S*(*n*)| *n*!*.*

**Theorem** *For a given n, J S*(*n*) *contains a total number of* 2*nn*! *distinguishable matrices,* |*J S*(*n*)| -2*nn*!*.*

*Proof* Each non-zero element of *Jn* has two values {−1, 1}, and *n* different elements have 2*<sup>n</sup>* selections. The *n* elements can select a total number of *n*! different positions. Both symbol and position selections are independent, and each combination determines a *Jn* matrix. So there are 2*nn*! distinguishable matrices.

**Corollary** *J S*(*n*) *is a matrix space that is* 2*<sup>n</sup> times larger than P S*(*n*)*.*

**Theorem** *A matrix group of simple rotation in J S*(*n*) *may contain* 2*n distinguishable matrices.*

*Proof* Using a rotation matrix *Zn* ∈ *J S*(*n*),

$$Z\_n = \begin{pmatrix} 0 & 1 \ 0 \ 0 \ \dots \ 0 \ 0 \\ 0 & 0 \ 1 \ 0 \ \dots \ 0 \ 0 \\ 0 & 0 \ 0 \ 1 \ \dots \ 0 \ 0 \\ & \dots \\ \dots & \dots & \dots \\ 0 & 0 \ 0 \ 0 \ \dots \ 0 \ 1 \\ -1 & 0 \ 0 \ 0 \ \dots \ 0 \ 0 \end{pmatrix}, \quad J\_{i,i+1} = 1, i \in [1, n], J\_{n,1} = -1 \text{ and a vector}$$

$$X = \begin{pmatrix} 1 \ 2 \ 3 \ \dots \ n - 1 \ n \end{pmatrix}.$$

To apply 2*n Zn* matrices sequentially to the vector *X*, the following 2*n* vectors are produced:

$$\begin{pmatrix} X = XZ\_n^{2n} \\ XZ\_n \\ \dots \\ XZ\_n^n \\ XZ\_n^{n+1} \\ \dots \\ XZ\_n^{2n-1} \end{pmatrix} = \begin{pmatrix} 1 & 2 & 3 & \dots & n-2 & n-1 & n \\ -n & 1 & 2 & \dots & n-3 & n-2 & n-1 \\ & \dots & \dots & & \dots \\ -1 & -2 & -3 & \dots & -n+2 & -n+1 & -n \\ n & -1 & -2 & \dots & -n+3 & -n+2 & -n+1 \\ & n & -1 & -2 & \dots & & \dots \\ & \dots & \dots & & \dots & & \\ 2 & 3 & 4 & \dots & n-1 & n & -1 \end{pmatrix}.$$

That is, 2n distinguishable matrices *Z j n* 2*n j*-1 , *Z*<sup>0</sup> *n* - *Z*2*<sup>n</sup> n* - *In* are included. Because of *X Zn n* −→ −*X Zn n* −→ *X*, there are *Z<sup>n</sup> n* - −*In* and *Z*2*<sup>n</sup> n* - *In*, that is, *Zn*

**Theorem** *For a Zn, there are n eigenvalues* {λ*i*} *n i*-<sup>1</sup>, λ*<sup>i</sup>* -<sup>√</sup>*<sup>n</sup>* <sup>−</sup>1,*<sup>i</sup>* <sup>∈</sup> [1, *<sup>n</sup>*]*.*

*Proof*

*<sup>n</sup>* -−*In*.

$$|\lambda I\_n - Z\_n| = \begin{vmatrix} \lambda & -1 & 0 & \dots & 0 & 0 \\ 0 & \lambda & -1 & \dots & 0 & 0 \\ & \dots & & \dots & \dots & \dots \\ 0 & 0 & 0 & \dots & \lambda & -1 \\ 1 & 0 & 0 & \dots & 0 & \lambda \end{vmatrix} = \lambda^n + 1 = 0.1$$

Therefore, *Zn* - <sup>√</sup>*<sup>n</sup>* <sup>−</sup>*In*. For non-zero values, 1 : *x*| → *x*| −1 : *x*| → *<sup>x</sup>*¯| map *Zn* <sup>→</sup> <sup>√</sup>*<sup>n</sup>* <sup>¬</sup>. **Theorem** *For any state vector X, X*√*<sup>n</sup>*  *n* -¬*X.*

*Proof*

$$\begin{pmatrix} X \\ & X\sqrt[n]{\neg} \\ & & X\sqrt[n]{\neg} \\ & X\sqrt[n]{\neg} = \neg X \\ & X\sqrt[n]{\neg} = \neg X \end{pmatrix} = \begin{pmatrix} \langle 1 \mid \langle 2 \mid \langle 3 \mid \dots \mid \langle n \mid \\ & \langle \bar{n} \mid \langle 1 \mid \langle 2 \mid \dots \rangle \langle n-1 \mid \\ & \cdots \cdot \cdot \cdot \cdot \cdots \cdot \cdot \cdots \\ & \langle \bar{2} \mid \langle \bar{3} \mid \langle \bar{4} \mid \dots \rangle \langle 1 \mid \\ & \langle \bar{1} \mid \langle \bar{2} \mid \langle \bar{3} \mid \dots \rangle \langle \bar{n} \mid \rangle \end{pmatrix}.$$

#### **4 Conclusion**

Using (−1,0,1) permutation matrices as basic tools, the *n*th root of NOT operators for quantum computers can be constructed and implemented by the traditional logic structure. Considering that this problem has puzzled advanced research of quantum computer for 20 years, this solution can provide quantum computer designers to practically implement quantum computers using traditional logic. The details of this construction will investigate in other places and the relationships among conjugate logic, quantum logic, quantum gates and complex number structures will be explored for foundation of Quantum computers and quantum computation of future computers.

**Acknowledgements** Thanks to Dr. G. Liu, Mrs. W. Macmillan, Dr. C. Liu, Dr. A. Tharumarajah and Dr. S. Yang for their invaluable comments, suggestions and careful corrections. Supported, in part by CRC for Intelligent Manufacturing Systems and Technologies.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Part VII Applications—Binary Sequences

Unity can only be manifested by the binary. Unity itself and the idea of Unity are already two.

—Buddha

Every axiomatic (abstract) theory admits, as is well known, an unlimited number of concrete interpretations besides those from which it was derived.

Thus we find applications in fields of science which have no relation to the concepts of random event and of probability in the precise meaning of these words.

—Andrey Kolmogorov

At its most fundamental, information is a binary choice, in other words,

a single bit of information is one yes-or-no choice.

—James Cleick

Various approaches of variant construction on binary sequences were developed from 2011 on cellular automata data sequences to construct 2D/3D maps. From 2014, different binary sequences generated from stream ciphers have been extensively examined and combinatorial maps were developed. For example, Variant Pseudo-Random Number Generator, Hakin9 Extra, Issue 6, 2012 (13), 28–31. http://hakin9.org/hakin9-extra-62012/, Interactive Maps on Variant Phase Spaces in Emerging Application of Cellular Automata, InTech Press, 113–196, 2013. http:// dx.doi.org/10.5772/51635.

Further results were published, e.g., Cryptographic Sequence on Variant Maps, ASONAM 2017: 1065–1071. https://doi.org/10.1145/3110025.3110152, and Stationary Randomness of Quantum Cryptographic Sequences on Variant Maps, the 2017 IEEE/ACM International Conference, ASONAM 2017:1041–1048. https://doi. org/10.1145/3110025.3110151.

This part of binary sequences is composed of five chapters (18–22).

Chapter "Novel Pseudorandom Number Generation Using Variant Logic Framework" proposes a novel PRNG using variant logic framework to apply mixed operations of permutation and complement in variant tables to generate random sequences under various control parameters.

Chapter "RC4 Cryptographic Sequence on Variant Maps" uses binary sequences of RC4 stream cipher on 1DP and 2DP variant maps. Different characteristics of visual distributions can be observed.

Chapter "Refined Stationary Randomness of Quantum Random Sequences on Variant Maps" checks three quantum random sequences {ANU, USTC, USTC0} stationary randomness, significant measuring differences identified.

Chapter "Using Information Entropy to Measure Stationary Randomness of Quantum Random Sequences" uses information entropy to measure stationary randomness of quantum random sequences. Data streams from USTC are selected and their quantitative measurements are compared.

Chapter "Visual Maps of Variant Combinations on Random Sequences" proposes visual maps of variant combinations on random sequences that provide a flexible framework to support various projections under complicated combinations. Typical maps are illustrated.

# **Novel Pseudorandom Number Generation Using Variant Logic Framework**

**Jeffrey Zheng**

**Abstract** Cybersecurity requires cryptology for the basic protection. Among different ECRYPT technologies, stream cipher plays a central role in advanced network security applications; in addition, pseudorandom number generators are placed in the core position of the mechanism. In this chapter, a novel method of pseudorandom number generation is proposed to take advantage of the large functional space described using variant logic, a new framework for binary logic. Using permutation and complementary operations on classical truth table to form relevant variant table, numbers can be selected from table entries having pseudorandom properties. A simple generation mechanism is described and shown, and pseudorandom sequences are analyzed for their cycle property and complexity. Applying this novel method, it can play a useful role in future applications for higher performance of cybersecurity environments.

**Keywords** Pseudorandom number generation · Variant logic · Cryptology

# **1 Introduction**

In advanced cyber environment, cybersecurity mechanism plays a guider role to protect the secure information communicated and stored in network facilities [1, 2]. To achieve adequate network security effects, cryptology has to be placed in the essential position [1]. Different from block ciphers that operate with a fixed transformation on a large block of plaintext, stream ciphers operate with a time-varying transformation on individual plaintext digits. Under the stream cipher methodology, Pseudorandom Number Generator (PRNG) is placed in the central part of the mechanism.

J. Zheng (B)

This work was supported by the Key Project on Electric Information and Next Generation IT Technology of Yunnan (2018ZI002), NSF of China (61362014), Yunnan Advanced Overseas Scholar Project.

Key Laboratory of Software Engineering of Yunnan, Yunnan University, Kunming, China e-mail: conjugatelogic@yahoo.com

<sup>©</sup> The Author(s) 2019

J. Zheng (ed.), *Variant Construction from Theoretical Foundation to Applications*, https://doi.org/10.1007/978-981-13-2282-2\_18

From 2000 to 2003, New European Schemes for Signatures, Integrity, and Encryption (NESSIE) were started [3]. During 2004–2008, another European stream cipher project: eSTREAM selected four software and three hardware schemes for ECRYPT stream ciphers [4]. Such extensive international activities on ECRYPT methodologies are showing the ultra-importance of stream cipher technologies in cyber environments for wider security applications.

From a cyber resilience viewpoint [5–7], a set of researchers focus attention on leakage-resilient pseudorandom generator. This direction has shown interesting results to protect valuable information from side-channel attack aspects.

Since PRNG plays a key role in stream cipher applications and is the heart of cryptology [1, 8–10]. Many mathematical methodologies are applied to this field such as linear automata, cellular automata, Galois fields, and other algebraic constructions [1, 9, 11–14]. In cryptology, Boolean logic operations are essential to create highly effective cryptology systems [1, 9, 15, 16] as binary logic generates the greatest efficiency through manipulation of only 1's and 0's. Therefore, it is advantageous to investigate potential mechanisms in binary logic due to the follow-on effect it has in cryptology.

#### **2 Classical Logic Function Table**

A classic logic function in n variables can be represented as a truth table [8, 9]. For a classic sequence in an ordinary number sequence, each table contains 2*<sup>n</sup>* columns and 2<sup>2</sup>*<sup>n</sup>* rows with a total of 2*<sup>n</sup>* · <sup>2</sup><sup>2</sup>*<sup>n</sup>* bits, respectively. An example of the standard truth table can be seen in Fig. 1a.


(a) Truth Table Example (b) Variant Table Example

**Fig. 1** *n* variable truth table and variant table under *P* and operators

#### **3 Variant Logic Function Table**

Variant logic construction is a new proposed theoretical structure [17, 18] to extend classical logic from the three basic operators: {∩,∪, ¬}. Two additional vector operators: permutation *P* and complementary are included with the original three to form the five basic operators within the novel framework. Let *S*(*N*) denote a permutation group with*N* elements, then *S*(*N*) contains a total of *N*! permutation operators. Let *B<sup>N</sup>* <sup>2</sup> {0, <sup>1</sup>}*<sup>N</sup>* denote a binary group with *<sup>N</sup>* elements, then *<sup>B</sup><sup>N</sup>* <sup>2</sup> contains a total of 2*<sup>N</sup>* complementary operators.

The permutation (*P*) and complementary (-) operators are two vector operators performed on each column vector of 2<sup>2</sup>*<sup>n</sup>* bits. For a given *P* and -, two operators transform the truth table into a variant table. Permutation operators change positions of relevant columns but do not change their values. Complementary operators (-) do not change the position for each column, but may change entire values of the column. Two given operators can be performed together to generate a variant table for further usages. There are 2*<sup>n</sup>* columns in the table as permutation elements, so this permutation group *S*(2*<sup>n</sup>*) contains a total of 2*<sup>n</sup>*! permutation operators, and its complementary group *B*2*<sup>n</sup>* <sup>2</sup> includes a total of 2<sup>2</sup>*<sup>n</sup>* complementary operators. An example of the variant table can be seen in Fig. 1b.

#### **4 Variant Method of Pseudorandom Number Generation**

**Input**: *n*, *P*,-, *m*, *L* variables, *n* ∈ *N*, *P* ∈ *S*(2*<sup>n</sup>*),-, *<sup>L</sup>*, *<sup>m</sup>* <sup>∈</sup> *<sup>B</sup>*2*<sup>n</sup>* 2 **Output**: {*Km*, *Km*+1, . . . ., *Km*+*L*−1}*L* · 2*<sup>n</sup>* bit sequences

**Method**: The process for pseudorandom number generation can be seen in Fig. 2. *n* is the input variable number. Using *n* variables, a standard truth table can be constructed in 2*<sup>n</sup>* columns and 2<sup>2</sup>*<sup>n</sup>* rows. *P* is a given permutation operator *P* (*P*2*n*−<sup>1</sup> ... *PI* ... *P*0), *P* ∈ *S*(2*<sup>n</sup>*), where *PI* corresponds to the *I*-th column. A given complementary operator - <sup>∈</sup> *<sup>B</sup>*<sup>2</sup>*<sup>n</sup>* <sup>2</sup> , - (-<sup>2</sup>*n*−<sup>1</sup> ...-*<sup>I</sup>* ...-0), -*<sup>I</sup>* ∈ *B*<sup>2</sup> shows that the operator is performed on the *I*-th column, if -*<sup>I</sup>* 0, all values of the column are reversed and if -*<sup>I</sup>* 1, all values are invariant. 0 <sup>≤</sup> *<sup>m</sup>* <sup>&</sup>lt; <sup>2</sup><sup>2</sup>*<sup>n</sup>* is an initial position for output sequences; from *Km*, *L* conditions, {*Km*+*<sup>i</sup>*} *L*−1 *<sup>i</sup>*<sup>0</sup> are output generated 0–1 bit sequences.

#### **5 Sequence Generation Example**

For convenient understanding procedure, an example is selected to show in the *n* = 2 case shown in Fig. 3. Parameters are initialized to arbitrary values: *n* 2, *P* (1203), and -(0110).

After the table is generated, the pseudorandom sequence can read off the table. For *m* 4 and *L* 6 conditions, a random number starting at position 4 of the variant table containing six elements can be found.

**Fig. 2** Variant method of random number generation

**Fig. 3** Example for generation of pseudorandom sequence

#### **6 Complexity Analysis**

From an application viewpoint, it is important to have the exact complexity evaluation for the method. In the initial stage, it is necessary to manipulate 2*<sup>n</sup>* columns and each column with 22*<sup>n</sup>* rows; the total numbers of 2*<sup>n</sup>* · <sup>2</sup>2*<sup>n</sup>* bits are required. The total complexity is of order *<sup>O</sup>*(2*<sup>n</sup>* · <sup>2</sup>2*<sup>n</sup>* ).

To generate variant table values, *P* operations need at least to manipulate bits once and operations to manipulate the same number of bits, i.e., *<sup>O</sup>*(2*<sup>n</sup>* · <sup>2</sup>2*<sup>n</sup>* ).

Selecting *L* · 2*<sup>n</sup>* bits from the variant table, it is necessary to perform *O*(*L* · 2*<sup>n</sup>*) operations.

If a full table needs to be generated as a random resource, *<sup>O</sup>*(2*<sup>n</sup>* · <sup>2</sup><sup>2</sup>*<sup>n</sup>* ) computational complexity is required. In general, their computational complexity is *O*(*L* ·2*<sup>n</sup>*) <sup>−</sup> *<sup>O</sup>*(2*<sup>n</sup>* · <sup>2</sup><sup>2</sup>*<sup>n</sup>* )0 < *L* < 2<sup>2</sup>*<sup>n</sup>* .

Maximal cycle length: under this construction, the maximal length of the pseudorandom number sequence is 2*<sup>n</sup>* · <sup>2</sup><sup>2</sup>*<sup>n</sup>* bits. For any short sequences, the output sequence has a length less than this number. No clear cycle effects can be directly observed.

#### **7 Conclusion**

It is important to design this new PRNG method to use variant logic construction. Since *P* and potentially have a huge configuration space 2*<sup>n</sup>*! <sup>×</sup>2<sup>2</sup>*<sup>n</sup>* times larger than classical logic function spaces. Exploring how difficulties for this mechanism to be decoded will be the main issue for coming cryptologist's theoretical targets. In addition, it is important to understand what type of distribution will be relevant to this generation mechanism. Owing to intrinsic complexity of variant logic construction, this provides potential barriers to protect this type of sequences decoded directly.

Considering PRNG placed in the central part of stream cipher mechanism, and stream cipher technologies are more and more important in advanced network security environment, higher performance methodology and relevant implementation will be useful in this field. Ongoing approaches will focus on whether this mechanism provides better PRNG methods to help different protections on side-channel attacks [1–7, 19, 20] in wider network applications to resolve practical leakage-resilient issues in the future.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **RC4 Cryptographic Sequence on Variant Maps**

**Zhonghao Yang and Jeffrey Zheng**

**Abstract** In modern cyberspace environment, big data streams are the most important issue in people's daily lives, each person produces a larger number of data streams every day from personal computer, cell phone, and kinds of wearable smart device. Security risks of storage and transmission of data streams may lead to personal privacy disclosure, it is important for network security to have useful tools facing challenges. Randomness testing provides useful tools to secure results of stream ciphers. Based on multiple statistical probability distributions, this chapter presents a visual scheme, variant maps, to measure a whole cryptographic sequence into multiple 1D and 2D maps. Mapping mechanism and sample cases are provided.

**Keywords** Random sequence · Big data · Variant map

# **1 Introduction**

In modern cyberspace environments, more than 2.5 EB data streams per day are generated from global network environments [1]. Huge network companies managed massive data streams in PB every day [2]. The development of artificial intelligence fields makes it easier to extract valuable information from big data [3–5]. Big data

Z. Yang

Yunnan University, Kunming, China e-mail: houseashley07@hotmail.com

J. Zheng (B)

J. Zheng

© The Author(s) 2019

297

This work was supported by the Key Project on Electric Information and Next Generation IT Technology of Yunnan (2018ZI002), NSF of China (61362014) and Yunnan Advanced Overseas Scholar Project.

Key Laboratory of Quantum Information of Yunnan, Yunnan University, Kunming, China e-mail: conjugatelogic@yahoo.com

Key Laboratory of Software Engineering of Yunnan, Yunnan University, Kunming, China

J. Zheng (ed.), *Variant Construction from Theoretical Foundation to Applications*, https://doi.org/10.1007/978-981-13-2282-2\_19

and big data technology provide modern societies so much convenience to many places, and with several threats to network security [6, 7].

Stream ciphers are the most useful scheme to protect the security of data streams in both transmission and storage processes. Pseudorandom number sequences are generated by various algorithms based on recursive computational models, and true random number sequences are generated by different physical methods. The typical stream ciphers are RC4 and Salsa20. Stream ciphers can be built using block ciphers in OFB or CTR model. In this chapter, an RC4 stream cipher is selected to generate pseudorandom sequences for testing.

From a testing viewpoint, randomness tests focus on three aspects: probability, autocorrelation, and unpredictability. NIST 800-22 provides a list of randomness testing method based on *p*-value [8].

In this chapter, two types of 1D and 2D statistical probability maps are used to visualize a longer pseudorandom number sequence generated from an RC4 stream cipher.

#### **2 Related Work**

Variant map is an emerging technology proposed in 2010s to handle multiple 0–1 vectors in phase spaces on variant framework [9–11]. Different applications are explored for variant maps on ECG data sequences [12], bat echolocation call sequences [13], gene sequence [14], and cryptographic sequences [15–17].

#### **3 Mapping Model**

This chapter uses two mapping schemes on 1D and 2D statistical probability distributions as variant maps for an input N-length 0–1 sequence. The architectural diagram of the mapping model is shown in Fig. 1. It is composed of three components: segmentation, measurement, and visualization.

**Fig. 1** Architecture of variant map for cryptographic sequence

#### *3.1 Basic Symbol*


#### *3.2 Mapping Model*

Three components can be described as follows.

• Segmentation

Input data is a 0–1 sequence *S* of length *N*. It can be divided into *M* segments and each segment has *m* elements.

$$\mathbf{M} = \left\lfloor \frac{N}{m} \right\rfloor$$

$$S = \{s\_0, s\_1, \dots, s\_i, \dots, s\_{M-1}\}, \quad 0 \le i < M$$

• Measurement

For each segment *si* of *S*, the following analysis is performed to obtain the one feature *pi* of the segment, that is, the number of 1 of *si* , and 0 ≤ *p* ≤ *m*. For example, for two segments *s*<sup>1</sup> - 00011 and *s*<sup>2</sup> - 10110, and two measurements are *p*<sup>1</sup> - 2 and *p*<sup>2</sup> -3 (Fig. 2).

Calculating all segments of *S*, a set of *p* measurements are determined.

$$\{p\_0, \dots, p\_i, \dots, p\_{M-1}\} = \{p\_i\}\_{i=0}^{M-1}, \quad 0 \le i < M$$

• Visualization

From the generated sequence of measurements, two types of diagrams can be created: The first one is a 1D map, 1DP sorted from {*pi*}*<sup>M</sup>*−<sup>1</sup> *i*-<sup>0</sup> directly shown in Fig. 3a. The second one is a 2D map, 2DP sorted from a pair of measurements {*pi, pi*+1}*<sup>M</sup>*−<sup>1</sup> *i*-0

**Fig. 3** Two maps; **a** 1DP; **b** 2DP

created from {*pi*}*M*−<sup>1</sup> *i*-<sup>0</sup> shown in Fig. 3b. This mapping scheme is one of Markov chain models.

#### **4 Random Sequence Data Sources**

In this chapter, a pseudorandom generator is based on an AES block cipher on the OFB mode. A total amount of 120 MB cryptographic sequences has been generated.

#### **5 Mapping Results**

The input sequence is mapped with a list of various lengths on different segmentations. Three sets of various m lengths are selected and two types of relevant 1DP and 2DP maps are shown in Fig. 4a–c, for (a) *m* - {8*,* 16*,* 32*,* 64*,* 128*,* 256}, (b) *m* - {80*,* 100*,* 120*,* 140*,* 160}, and (c)*m* - {126*,* 127*,* 128*,* 129*,* 130}. Four enlarged 2DP maps are shown in Fig. 5 for *m* - {126*,* 127*,* 128*,* 129} and two enlarger 2DP maps are shown in Fig. 6 for *m* -{128*,* 130}, respectively.

#### **6 Result Analysis**

In Fig. 4, both 1DP and 2DP maps are illustrated. When the input sequence is larger enough to *m* × 2*<sup>m</sup>*, the results of 1DP maps are corresponding to binomial distributions. It is interesting to see significant changes when various lengths of segments are applied.

**Fig. 4** 1DP and 2DP maps. **a** *m* - {8*,* 16*,* 32*,* 64*,* 128*,* 256}; **b** *m* - {80*,* 100*,* 120*,* 140*,* 160}; **c** *m* - {126*,* 127*,* 128*,* 129*,* 130}; **d** enlarged 1dp and 2dp, *m* -{126*,* 127*,* 128*,* 129*,* 130}

For various 2DP maps in Figs. 4, 5, and 6, 2D distributions are represented as pseudocolor to illustrate relevant 3D structures. From smaller maps to enlarged maps,

**Fig. 5** Enlarger 1DP maps. **a** *m* -126; **b** *m* -127; **c** *m* -128; **d** *m* -129

many interesting features can be identified and significant symmetric or nonsymmetric properties could be identified. Enlarger maps can see further refined patterns in detail.

#### **7 Conclusion**

Mapping model in this chapter is a focus on a single sequence for two types of 1DP and 2DP maps. 1DP maps are corresponding to classical statistical maps and 2DP maps are represented as various Markov chains. Further researches and experiments are required to explore useful tools on cryptographic sequences in detail (Figs. 7 and 8).

**Fig. 6** Enlarged 2DP maps. **a** *m* -126; **b** *m* -127; **c** *m* -128; **d** *m* -129

**Fig. 7** Enlarger 1DP maps. **a** *m* -128; **b** *m* -130

**Fig. 8** Enlarger 2DP maps. **a** *m* -128; **b** *m* -130

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Refined Stationary Randomness of Quantum Random Sequences on Variant Maps**

**Jeffrey Zheng, Yamin Luo and Zhefei Li**

**Abstract** In this chapter, a testing model is used to apply statistical probability in multiple distributions on three maps for a selected sequence to check refined stationary randomness on quantum sequences. Three random data sequences are collected from two quantum random resources: one from Australian National University (ANU) and two (initial and secure) from University of Science and Technology of China (USTC). Multiple results are created on three maps, and measurements of stationary randomness are illustrated and compared. Three samples show distinct stationary properties.

**Keywords** Variant maps · Quantum random sequence · Chaotic random sequence Ordered measures · Maximal; Stationary randomness

## **1 Introduction**

In advanced social network environment, multimedia signal sequences of big data streams are composed of time series processes. Quantum experiments in quantum satellite using quantum key distribution (QKD) systems [1] is the most advanced ICT

J. Zheng (B)

J. Zheng

Key Laboratory of Software Engineering of Yunnan, Yunnan University, Kunming, China

Y. Luo · Z. Li Yunnan University, Kunming, China e-mail: 1047668416@qq.com

Z. Li e-mail: 576167164@qq.com

This work was supported by the Key Project on Electric Information and Next Generation IT Technology of Yunnan (2018ZI002), NSF of China (61362014), Yunnan Advanced Overseas Scholar Project.

Key Laboratory of Quantum Information of Yunnan, Yunnan University, Kunming, China e-mail: conjugatelogic@yahoo.com

development to establish ultra-secure quantum communications. For a QKD system, a truly random number generator [2] play a key role. From an analysis viewpoint, it is necessary to test stationary randomness in time variations. In this section, a list of relevant schemes: pseudo/truly random sequences, P\_value, statistical probability distribution, optical statistics, stationary properties, and variant maps, are discussed.

#### *1.1 Pseudo/True Random Sequences*

#### **1.1.1 Pseudorandom Sequences**

Traditional stream ciphers [3] on linear feedback shift register structure (LFSR) are used as pseudorandom number generators. The LFSR stream ciphers are the core in classical stream ciphers.

The new generation of stream ciphers has being shifted from LFSR [3] to nonlinear modes: NLFSR, clock control [4] and nonlinear functions, etc. It is difficult to use nonlinear mathematical theories, recursive models, descriptive tools, and implementing schemes in nonlinear dynamic environments.

#### **1.1.2 True Random Sequences**

Differently from pseudorandom sequences generated by stream ciphers, high-quality stochastic oscillators of truly random sequences are generated from special hardware devices such as laser photonics [5], nonlinear optics, quantum optics [6], quantum noises, thermal noise, chaos, and fractal nonlinear dynamics [7].

#### *1.2 Testing Schemes*

#### **1.2.1** *P***\_value Schemes**

Various statistic testing packages measure randomness properties on a given random sequence. NIST 800-22 package [8] is a typical representative to provide more than 15 testing schemes. Using the package, it is essential to check whether *P*\_value >0.01 for the sequence. Since such measuring scheme provides a static condition, it is difficult to use only *P*\_value parameter to express complex dynamic behaviors involved in random sequences.

#### **1.2.2 Multiple Statistical Probability Distributions**

Measuring random sequences under segment conditions, multiple statistical probability schemes are useful to create various distributions to illustrate complex spatial relationships.

Multivariate normal probability distributions are the most important and powerful tools to test stochastic characteristics of a random data sequence under the framework of probability, stochastic process and statistics [9] for nonlinear problems. In this kind of measuring models, when a data sequence is sufficiently long, the high dimensional probability distribution of the sequence [10] is converged to a continuous Gaussian distribution. Multivariate Gaussian probability distributions support various schemes to analyze complex stochastic data set of measuring sequences in continuous conditions.

#### **1.2.3 Photon Statistic in Quantum Optics**

Photon statistics is the theoretical and experimental approach on the statistical distributions in photon counting experiments to analyze the statistical nature of photons in a light source.

Three types of distributions can be obtained by the light source [11]: Poissonian, super-Poissonian, and sub-Poissonian. The variance and average number of photon counts are identified for the corresponding distribution. Both Poissonian and super-Poissonian light are described by a semi-classical theory in which the light source is modeled as an electromagnetic wave and the atom is modeled by quantum mechanics. In contrast, sub-Poissonian light requires the quantization of the electromagnetic field for a proper description and is a direct measure of the particle nature of light.

#### **1.2.4 Stationary Properties**

In mathematics and statistics, a stationary process is a stochastic process [12] whose joint probability distribution does not change when shift operations performed. Consequently, parameters such as mean and variance, if they are present, also do not change over time. Stationarity is an interesting property in time series analysis.

In applied mathematics, the Wiener–Khinchin theorem [13], states that the Autocorrelation Function (ACF) of a wide-sense stationary process has a spectral decomposition given by the power spectrum of the process. One of the effective ways for identifying stationary times series is the ACF plot [14]. For a stationary time series, the ACF will drop to zero relatively quickly.

#### *1.3 Quantum Random Resources*

Quantum random numbers can be generated from a physical quantum source of a coherent laser light to be splitting a beam of light into two beams and then measuring the power in each beam. Due to the light intensity in each beam fluctuates about the mean. Those fluctuations can be converted into a source of random numbers [15–17] being a stationary Poisson distribution.

#### **1.3.1 ANU Resource**

The ANU Quantum Random Numbers Server is an open website [18] to offer true random numbers to anyone on the internet. Such random numbers are generated in real-time by measuring the quantum fluctuations of the vacuum. The electromagnetic field of the vacuum exhibits random fluctuations in phase and amplitude at all frequencies. By carefully measuring these fluctuations, ultra-high bandwidth random numbers can be generated.

About 1 GB data streams are downloaded and 100MB data streams are used for the testing.

#### **1.3.2 USTC Resource**

In the Key Laboratory of Quantum Information, USTC, and CAS, true random number sequences are generated [16]. This type of true random sequences supports advanced quantum communication devices of QKD systems [19].

More than 20 GB quantum random number sequences are provided by USTC for random streams testing. Two data sequences are represented as USTC0 (initial) and USTC (secure), respectively. About 100MB data streams are selected for each sequence.

#### **1.3.3 Refined Properties**

From an analysis viewpoint, a Toeplitz hash algorithm has used to get an initial sequence USTC0 as input and USTC sequence as output. Checking such refined variations, this is an interesting property for us to make a detailed identification.

From a random testing viewpoint, initial sequences have some difficulties to pass NIST tests and secure sequences are ensured to pass NIST tests. Some refined differences on random characteristics could be distinguished.

#### *1.4 Variant Framework*

Various schemes following the top-down strategy are explored to use multiple measures to partition special phase spaces from a top state set to multiple bottom states via multilevels of a hierarchy in combinatorial algorithms [20], image analysis and processing for many years.

The conjugate classification [21] is proposed to apply seven measures in a hierarchy to partition the kernels of four regular plane lattices on *n* = {4, 5, 7, 9} cases for 2D binary images. For 1D cellular automata sequences, global random behaviors are visualized in 2D maps.

For *n*-tuple bit vectors, the variant logic framework [22] is proposed, various applications are explored: 3D visual method on random number sequences [23], variant Pseudorandom Number Generator (PRNG) [24], computational simulation on quantum interactions [25], noncoding DNA analysis, bat echolocation [26], and stationary randomness [27].

#### *1.5 Proposed Scheme*

For the convenience of testing stationary randomness on random sequences, we propose a testing system for a stationary random sequence with length *N*, multiple segments *M* are divided from the sequence by a given length *m*, a 2-tuple pair of measures can be extracted from a 0-1 segment that are the number of 1 element and the number of 1 pattern in the segment. All paired measures are composed of a sequence of *M* pairs of measures as an ordered measuring set with *M* elements.

The pairs of the measuring sequence are directly separated as two independent measuring sequences to keep each parameter in the same order. A total of three sequences of distinct measures are constructed including two sequences on single measures and one sequence on 2-tuple measures.

Following this approach, two sets of single measuring sequences are sorted as two 1D numeric arrays as statistical histograms corresponding to 1D maps and the 2-tuple measuring sequence is sorted as a 2D integer array as statistic histograms being a 2D map. Under the controlling operations on the changes of shift displacement, multiple results of the three measuring sequences are transformed into 1D statistic histograms and 2D pseudo-color maps to show effective patterns from the generated sequence under various positions and conditions on a list of shift operations.

#### *1.6 Organization of the Chapter*

This chapter uses a testing system for a stationary random sequence on the system architecture in Sect. 2. In Sect. 3, test results are provided for two quantum random sequences. From the results of the visual maps in Sect. 3, result analysis and brief comparison are described in Sect. 4. And finally in Sect. 5, the main results are summarized.

#### **2 Testing System**

To describe the testing system, diagrams are shown in Fig. 1.

**Fig. 1** The architecture of testing stationary random sequences

#### *2.1 System Architecture*

This system is composed of five parts: Input, Shifted Transformation (ST), Segment Measurement (SM), Combinatorial Projection (CP), and Output.

The input of the testing system is a selected 0-1 sequence and its output is composed of three maps, two in 1D and one in 2D for visual distributions, and three maximals to be processed by ST, SM, and CP modules, respectively.

Further technical details are described in Chapter.Stationary Randomness of Three Types of Six Random Sequences on Variant Maps of this book.

#### **3 Testing Results**

Three quantum random sequences are selected from ANU and USTC resources.

Typical results of testing stationary properties for three sequences in nine maps are shown in Fig. 2. Three sets of results are shown in Fig. 3a, b. In Fig. 3a, six values of *r* = {0, 16, 32, 96, 112, 128} are selected to show three pairs of corresponding maps: 1DP, 2DPQ, and 1DQ for three sequences on the top part. Nine 2D maps of maximal curves for *r* = 0 − 128 are shown to illustrate refined properties in stationary random curves on the bottom column. In Fig. 3b, three maximal curves on three 2D maps are compared. In Fig. 4a–c, three larger maps on *r* = {48, 64, 80} are shown corresponding to (a) 1DP, (b) 2DPQ, and (c) 1DQ for three cases. Three larger maps of three maximal curves are shown in Fig. 5.

#### *3.1 Quantitative Measurements*

For a *G* map, let *Gx* be an average variation, Δ*Gx* be a region of variations and *G<sup>R</sup> <sup>x</sup>* = Δ*Gx*/*Gx* be a variation ratio. In convenient in comparison, let {Max, Min} be the {largest, smallest} value on a maximal curve; Max-Min is its difference and |*ANU* − *USTC*| is an absolute difference between ANU and USTC measures.

**Fig. 2** ANU, USTC and USTC0 random sequences on 1DP, 2DPQ, and 1DQ maps

Let (*Max* − *Min*)/|*ANU* − *USTC*| be a relative ratio between (Max-Min) and |*ANU* − *USTC*|.

#### **4 Result Analysis**

Nine maps in Fig. 2 are in three columns. Three 1DP maps have similar distributions in bell shapes to illustrate Poissonian distributions. Three 2DPQ maps are 2D distributions and there are different symmetric distributions. Maximal elements in ANU, USTC, and USTC0 maps show stronger vertical oriented features. Three maps have a symmetry on left/right directions and have a broken symmetry on up/down directions. Pseudo-color pixels on three maps are shown in 3D shapes. Compared with three 1DP maps, three 1DQ maps have similar distributions and more narrow bell shapes to illustrate sub-Poissonian distributions.

Six groups of results on shift *r* : {0, 16, 32, 96, 112, 128} are shown in Fig. 3a on the top columns and each group contains nine distributions in three columns. Three random sequences have stronger stationary randomness that makes all maps in the similar style with minor changes on shift operations. Larger maps on *r* = {48, 64, 80}

**Fig. 3** ANU, USTC and USTC0 random sequences on three maps and maximals (**a**), (**b**); **a** Three pairs of nine variant maps in six groups and three pairs of nine maximal maps; **b** Three 2D maps of three maximal curves for ANU, USTC, and USTC0

in Fig. 4a–c provide refined visual information to show their variations in details. Enlarged and larger maximal curves are shown in Figs. 3b and 5 for *r* : 0 − 128 as nine 2D maps with values of average variation and region of variations. From the maximal and minimal stationary regions, there are 1–2% variation ratios for 1DP and 1DQ and 5% variation ratios for 2DPQ observed. Three curves of maximals on three 2D maps are illustrated in Figs. 3b and 5.

**Fig. 4** ANU, USTC, and USTC0 random sequences random sequences on enlarged maps, *r* = {48, 64, 80}; **a** 1DP; **b** 2DPQ; **c** 1DQ

**Fig. 4** (continued)

#### *4.1 Relative Ratios on Differences*

Details of three maximal measures are compared in Table 1. Three parameters {*Qx* , Δ*Qx* , *Q<sup>R</sup> <sup>x</sup>* } on 1DQ maps have 1 values on Max-Min and |*ANU* − *USTC*| ratios; there are 81 on *Px* and 1.6 on *P<sup>R</sup> <sup>x</sup>* and there are 65 on *P Qx* and 7.9 on *P Q<sup>R</sup> x* observed.

From this set of testing results, two samples of ANU and USTC are showing similar stationary properties and USTC0 with different stationary properties among the three sequences. Significant differences of relative ratios are observed from 2DPQ variation measurements.


**Table 1** Comparisons on three measures for ANU, USTC, and USTC0 samples

#### **5 Conclusion**

It is feasible to evaluate stationary randomness for a random sequence using the testing system. From three maps {1DP, 1DQ, 2DPQ}, maximals are identified for shift *r* : 0 − *m*. Three 2D maps of maximal curves provide refined characteristics to evaluate stationary randomness. Further explorations and applications are required to check the testing system on other applications.

**Acknowledgements** Thanks to the Key project of Quantum Communication of Yunnan Province, National Science Foundation of China (61362014) and High-Level Overseas Professional Project of Yunnan Province for financial supports to this project. Thanks to the Key Laboratory of Quantum Information, USTC, CAS, and the ANU Quantum Optical Laboratory for providing quantum random sequences.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Using Information Entropy to Measure Stationary Randomness of Quantum Random Sequences**

**Weizhong Yang, Yamin Luo, Zhefei Li and Jeffrey Zheng**

**Abstract** Different statistical measurements can be used to determine stationary randomness for random sequences. This chapter proposes a testing scheme for random sequences using information entropy as measurements. Datasets are collected from University of Science & Technology of China (USTC), three quantum random sequences are selected for testing. Multiple results are created on three maps, entropy curves, and quantitative measurements of stationary randomness are compared. Three differences of Max-Min entropy variation ratios are bounded in [0.08, 0.09]% region. The whole structure has measurable stationary properties.

**Keywords** Variant maps · Quantum random sequence · Ordered measures Entropy · Stationary randomness

W. Yang

W. Yang

Key Laboratory of Quantum Information of Yunnan, School of Software, Yunnan University, Kunming, China e-mail: yangweizhong@126.com

Y. Luo · Z. Li · J. Zheng (B) Yunnan University, Kunming, China e-mail: conjugatelogic@yahoo.com

Y. Luo e-mail: 1047668416@qq.com

This work was supported by the Key Project on Electric Information and Next Generation IT Technology of Yunnan (2018ZI002), NSF of China (61362014), Yunnan Advanced Overseas Scholar Project.

Shanghai Key Laboratory of Intelligent Information Processing, School of Computer Science, Fudan University, Shanghai, China

Z. Li e-mail: 576167164@qq.com

#### **1 Introduction**

From a statistical viewpoint, various parameters of statistical process [2–4, 7] could be stationary invariant [6] under shift operations on random sequences. Using variant maps [8], it is a normal approach to transfer a long random sequence into 1D and 2D statistical distributions as three maps: 1DP, 1DQ, and 2DPQ [9]. For each map, it is easy to divide each number by the total number to transfer a counting number into a probability measure. By this way, three sets of probability measures can be generated. Applying information entropy function to summarize all pairs of probability parameters, one map corresponds an information entropy measurement determined by the distribution for stationary randomness.

#### **2 Test Methodology**

The test for a stationary randomness requires a sequence with length *N*. For the given input sequence, multiple segments *M* are divided from the sequence by a given length *m*, a 2-tuple pair of measures can be extracted from a 0-1 segment that are the number of 1 element and the number of 1 pattern in the segment. All paired measures are composed of a sequence of *M* pairs of measures as an ordered measuring set with *M* elements.

The pairs of the measuring sequence are directly separated as two independent measuring sequences to keep each parameter in the same order. A total of three sequences of distinct measures are constructed including two sequences on single measures and one sequence on 2-tuple measures.

Following this approach, two sets of single measuring sequences are sorted as two 1D numeric arrays as statistical histograms corresponding to 1D maps and the 2-tuple measuring sequence is sorted as a 2D integer array as statistic histograms being a 2D map. Under the controlling operations on the changes of shift displacement, multiple results of the three measuring sequences are transformed into 1D statistic histograms and 2D pseudo-color maps to show effective patterns from the generated sequence under various positions and conditions on a list of shift operations.

#### *2.1 Dataset*

#### **2.1.1 USTC Resource**

In the Key Laboratory of Quantum Information, USTC, CAS, and quantum random number sequences are generated [5]. This type of true random sequences supports advanced quantum communication devices of QKD systems [1].

**Fig. 1** Methodology for information entropy testing stationary random sequences

More than 20 GB of quantum random number sequences are provided by USTC for random streams testing. Three sequences from eight sequences are selected from three stages (1 Initial, 2 Secure, and 4 Filtered). Each random sequence has a length of about 8MB.

#### **3 Method**

#### *3.1 Methodology*

This method consists of five steps (Fig. 1): Input, Shifted Transformation (ST), Segment Measurement (SM), Combinatorial Projection (CP), and Output.

The input of the testing system is a selected 0-1 sequence and its output is composed of three maps, two in 1D and one in 2D for visual distributions, and three maximals to be processed by ST, SM, and CP.

#### *3.2 Description of Steps*

The testing system consists of three steps: {ST, SM CP}.

**Input**: *X N* = *m* ∗ *M* bit sequence; *m* segment length; *M* total segments; *r* shift length;

**Output**: Three maps {1DP, 1DQ, 2DPQ}; Three Maximals {1DP*<sup>x</sup>* , 1DQ*<sup>x</sup>* , 2DPQ*x*} **Process**: Shifting *r* position from *X* to be *Y* = *X*(*r*) in ST. Making segment measuring sequences in SM and then projecting three measuring sequences as three maps and extracting three maximals in CP.

Let *X*, *Y* be 0-1 sequences with *N* elements, ST takes the sequence *X* as input, then shift *r* position on the whole sequence to be the shifted sequence *Y* = *X*(*r*) (i.e., a cyclic shift right + or shift left −).

324 W. Yang et al.

$$Y = X(r), Y[I] = X[I \pm r], I \pm r (\text{mod} N), \tag{1}$$

$$0 \le I < N; X[I], Y[I] \in \{0, 1\}$$

SM takes the shifted vector as inputted and divides the vector into *M* segments. For the *i*th sub-vector 0 ≤ *i* < *M* on the *j*th position 0 ≤ *j* < *m*, denoted as *Yi*,*<sup>j</sup>* .

This sequence at the end of sub-vectors after the segmenting operation forms an *m* ∗ *M* matrix, *m* positions for the *i*th complete row vector in the sequence correspond to a pair of 2-tuple measures: (*pi*, *qi*).

$$Y = \{Y\_i\}\_{i=0}^{M-1} \tag{2}$$

$$Y\_i = \{Y\_{i,0}, Y\_{i,1}, \dots, Y\_{i,j}, \dots, Y\_{i,m-1}\} \tag{3}$$

$$0 \le i < M, 0 \le j < m$$

$$Y\_i \to (p\_i, q\_i), 0 \le i \prec M \tag{4}$$

$$\{Y\_i\}\_{i=0}^{M-1} \to \{(p\_i, q\_i)\}\_{i=0}^{M-1} \tag{5}$$

The pair of 2-tuple measures (*pi*, *qi*) is determined by the following formula:

$$Y\_{i,j} = Y[J] \in \{0, 1\}; J = i \times m + j,\tag{6}$$

$$\begin{array}{c} 0 \le i < M, 0 \le j < m, 0 \le J < m \times M \\ \underline{m-1} \end{array}$$

$$p\_i = \sum\_{j=0}^{m-1} Y\_{i,j}, Y\_{i,j} \in \{0, 1\}, 0 \le p\_i \le m;\tag{7}$$

$$q\_i = \sum\_{j=0}^{m-1} [(Y\_{i,j-1}, Y\_{i,j}) == (0, 1)],\tag{8}$$

$$j - 1 \pmod{m}, 0 \le q\_i \le \lfloor m/2 \rfloor;$$

That is, *X* = 0011010010, *N* = 10, *M* = 2, *m* = 5;(*p*<sup>0</sup> = 2, *q*<sup>0</sup> = 1);(*p*<sup>1</sup> = 2, *q*<sup>1</sup> = 2).

The output from SM are *M* pairs of ordered 2-tuple measures {(*pi*, *qi*)} *M*−1 *<sup>i</sup>*=<sup>0</sup> .

CP consists of Split and Projection steps. Split adapts the 2-tuple measuring sequence {(*pi*, *qi*)} *M*−1 *<sup>i</sup>*=<sup>0</sup> , splitting it into two independent measuring sequences: {*pi*} *M*−1 *<sup>i</sup>*=<sup>0</sup> , {*qi*} *M*−1 *<sup>i</sup>*=<sup>0</sup> to keep the original order invariant.

The Three measure sequences are {*pi*} *M*−1 *<sup>i</sup>*=<sup>0</sup> ,{*qi*} *M*−1 *<sup>i</sup>*=<sup>0</sup> ,{(*pi*, *qi*)} *M*−1 *<sup>i</sup>*=<sup>0</sup> .

The Projection step turns the sequence into histograms: Project Array (PA), Color Map (CM), and Get Entropy (GE). For three measuring sequences, two types of 1D and 2D measures will be processed separately.

The PA processes measuring sequences to transform them into integer arrays and the CM will organize them on either normalized histograms (1D measures) or color maps (2D measures), respectively.

The 1D measures involve two measuring sequences: {*pi*} *M*−1 *<sup>i</sup>*=<sup>0</sup> ,{*qi*} *M*−1 *<sup>i</sup>*=<sup>0</sup> . Let *P*[*m* + 1], *Q*[*m*/2 + 1] and *N P*[*m* + 1], *N Q*[*m*/2 + 1] be two 1D (integer, float) arrays to represent the corresponding elements.

The 1DP statistic histogram is generated from a sequence {*pi*} *M*−1 *<sup>i</sup>*=<sup>0</sup> , *N P*, *P* two arrays (floating point, integer) with (*m* + 1) elements. For the *j*th element *N P*[*j*], *P*[*j*], 0 ≤ *j* ≤ *m*, and 1DP*<sup>e</sup>* the entropy element, the output can be obtained by the following procedure:

$$\begin{array}{c} \text{Initialization: } \forall NP[j] = 0.0, \\ P[j] = 0, 0 \le j \le m; \\ \text{Calculation: } for(i = 0; i < M; i++) \\ \{P[p\_i]++; \} \\ \text{Normalization: } for(j = 0; j \le m; j++) \\ \{NP[j] = P[j]/M; \} \\ \text{Get Entropy: } 1 \text{IDP}\_{\varepsilon} = -\sum\_{i=0}^{m} NP[j] \* \log\_2(NP[j]); \end{array}$$

In the 1DP map, the PA corresponds to Initialization and Calculation; the MA handles Normalization and the GE determines the entropy element of the map.

The 1DQ statistic histogram is generated from a sequence {*qi*} *M*−1 *<sup>i</sup>*=<sup>0</sup> , *N Q*, *Q* two arrays (floating point, integer) with (*m*/2 + 1) elements; For the *j*th element *N Q*[*j*], *Q*[*j*], 0 ≤ *j* ≤ *m*/2, and 1DQ*<sup>e</sup>* the entropy element, the output can be obtained from the following procedure:

Initialization: ∀*N Q*[*j*] = 0.0, *Q*[*j*] = 0, 0 ≤ *j* ≤ *m*/2; Calculation: *f or*(*i* = 0;*i* < *M*;*i* + +) {*Q*[*qi*] + +; } Normalization: *f or*(*j* = 0; *j* ≤ *m*/2; *j* + +) {*N Q*[*j*] = *Q*[*j*]/*M*; } Get Entropy: 1DQ *<sup>e</sup>* = −*m*/2 *<sup>j</sup>*=<sup>0</sup> *N Q*[*j*] ∗ *log*2(*N Q*[*j*])

Using *P*, *N P*, *Q*, *N Q* arrays, it is possible to generate corresponding 1D statistical histograms as 1D maps.

In the 1DQ map, the PA corresponds to Initialization and Calculation; the MA handles Normalization and the GE identifies the entropy element of the map.

The 2D measures specially processes one measuring sequence: {(*pi*, *qi*)} *M*−1 *<sup>i</sup>*=<sup>0</sup> . Let *P Q*, *NPQ* be two 2D (integer, float) arrays.

A 2DPQ statistic histogram is generated from a sequence {(*pi*, *qi*)} *M*−1 *<sup>i</sup>*=<sup>0</sup> , *P Q*, *NPQ* 2D arrays with (*m* + 1) × (*m*/2 + 1) elements. For the *i*, *j*th element *P Q*[*i*, *j*], *NPQ*[*i*, *j*], 0 ≤ *i* ≤ *m*, 0 ≤ *j* ≤ *m*/2, and 2DPQ*<sup>e</sup>* the entropy element, their values can be obtained by the following procedure:

$$\begin{array}{l} \text{Initialization: } \forall P \, Q[i, j] = 0, \\ 0 \le i \le m, \, 0 \le j \le \lfloor m/2 \rfloor; \\ \text{Calculation: } for(i = 0; i < M; i++) \\ \{ \,\,\,^p Q[p\_i, q\_i] + + ; \} \\ \text{Pseudo-color: } \forall P \, Q[i, j], 0 \le i \le m, 0 \le j \le \lfloor m/2 \rfloor \\ \text{Normalization: } for(j = 0; j \le m; j++) \{ \\ \qquad \qquad \qquad for(j = 0; j \le \lfloor m/2 \rfloor; j++) \\ \,\,^j NP\{i, j\} = P\, Q[i, j]/M; \} \\ \text{Get Entropy: } \text{IDPQ}\_{\epsilon} = -\sum\_{j=0}^{\lfloor m/2 \rfloor} \sum\_{i=0}^m NP\, Q[i, j] \* \log\_2(NP\, Q[i, j]) \end{array}$$

In the 2DPQ map, the PA corresponds to Initialization and Calculation; the MA handles Pseudo-color, Normalization and the GE identifies the entropy element of the map.

Through the CP module, three measuring sequences are transformed into two 1D arrays and one 2D array with (*m* + 1), (*m*/2 + 1) and (*m* + 1) × (*m*/2 + 1) clusters.

The output of the testing system are three maps {1DP, 1DQ, 2DPQ} and three entropies {1DP*e*, 1DQ*e*, 2DPQ*e*} as expected statistic distributions and representatives of the input 0-1 sequence, respectively.

#### **4 Results**

Three quantum random sequences are selected from USTC {1, 2, 4} streams.

Typical results of testing stationary properties for three sequences in nine maps are shown in Fig. 2. Top part contains three 2D maps of global entropy curves on *r* = 0 − 128 condition. Three 2D maps of entropy curves for*r* = 0 − 128 are shown to illustrate refined properties in stationary random curves. Three sets of variant maps in*r* = 0 and their enlarged entropy curves on*r* = 0 − 128 are shown in three columns to illustrate corresponding 1DP, 1DQ, and 2DPQ maps for three sequences. Three larger maps of three global entropy curves are shown in Fig. 3.

For a *G* map, let *Ge* be an average entropy variation, Δ*Ge* be a region of entropy variations, and *G<sup>R</sup> <sup>e</sup>* = Δ*Ge*/*Ge* be an entropy variation ratio. Three entropy curves on three 2D maps are compared. Three entropy measurements and {Max, Min, Max-Min} values for three sequences are listed in Table 1. Three variation ratios and their numeric quantities are listed in Table 2.

#### **5 Result Analysis**

Three 2D maps of global entropy curves show stronger stationary randomness under shift operations on *r* = 0 − 128. Three entropy curves on each map are three stable

**Fig. 2** Three USTC random sequences:{1, 2, 4} on 2DPQ, 1DP, and 1DQ maps and *r* = 0 − 128 entropy curves


**Table 1** Comparisons on three measures for three USTC samples

**Table 2** *Qe* + *Pe* : *P Qe* measures


horizontal lines. From a global viewpoint, there are significant differences compared with entropy curves between No. 1 (PQ and P) and No. 2 & 3 cases. Both No. 2 and 3 are in similar measures.

Nine variant maps in 2DPQ, 1DP, and 1DQ, three 2DPQ maps are 2D distributions and there are different symmetric distributions. Maximal elements in three maps show stronger vertical-oriented features. Three maps have a symmetry on left/right directions and have a broken symmetry on up/down directions. Pseudo-color pixels on three maps are shown in 3D shapes. Three 1DP maps have similar distributions in bell shapes to illustrate Poissonian distributions. Compared with three 1DP maps, three 1DQ maps have similar distributions and more narrow bell shapes to illustrate sub-Poissonian distributions.

However, nine enlarged entropy curves for each type have significantly different variations and distributions. Local curves are bounded in narrow regions with random variations.

It is difficult to tell detailed differences from entropy curves. Quantitative measurements in Table 1 are helpful to use numeric values in comparison. The difference of entropy variation ratios are on three sets, *Q<sup>R</sup> <sup>e</sup>* : [0.26, 0.35]%, *P<sup>R</sup> <sup>e</sup>* : [0.19, 0.27]%, and *P Q<sup>R</sup> <sup>e</sup>* : [0.12, 0.20]%. Three Max-Min values of {*Q<sup>R</sup> <sup>e</sup>* , *P<sup>R</sup> <sup>e</sup>* , *P Q<sup>R</sup> <sup>e</sup>* } are bounded in [0.08, 0.09]%. The whole structure illustrates measurable stationary properties. In Table 2, it is interesting to notice that *Qe* + *Pe* ∼ *P Qe*.

All variation measurements are shown in distinct stationary randomness to be measured by entropy approaches.

#### **6 Conclusion**

Information entropy is a useful measurement to determine stationary randomness. Three quantum random sequences are used, distinct stationary randomness can be identified from both variant maps and numeric measurements. To explore various conditions of stationary properties, further investigations are required to explore theoretical boundaries on variant maps.

**Acknowledgements** Thanks to The Key project of Quantum Communication of Yunnan Province, National Science Foundation of China (61362014) and High-Level Overseas Professional Project of Yunnan Province for financial supports to this project. Thanks to the Key Laboratory of Quantum Information, USTC, and CAS for providing quantum random sequences.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Visual Maps of Variant Combinations on Random Sequences**

**Jeffrey Zheng and Jie Wan**

**Abstract** Random sequences play the key role in network security applications. Randomness testing schemes are very important to ensure the randomness qualities for relevant sequences. This chapter proposes a visual scheme based on variant construction to measure sequences to intuitively show some combinatorial properties of key stream generated by stream ciphers. Basic models are described. This scheme provides a flexible framework for the variant measure method on the key stream of stream ciphers to describe randomness in various combinatorial maps.

**Keywords** Visual scheme · Variant measure · Combinatorial projection Random sequence

# **1 Introduction**

Random numbers play an important role in many network protocols and encryption schemas on various network security applications [1], for example, visual crypto, digital signatures, authentication protocols and stream ciphers. To determine whether a random sequence is suitable for a cryptographic application, the NIST has published a series of statistical tests as standards.

J. Zheng (B)

J. Zheng

Key Laboratory of Software Engineering of Yunnan, Yunnan University, Kunming, China

J. Wan The People's Bank of China, Kunming, China e-mail: wanjiech@163.com

© The Author(s) 2019 J. Zheng (ed.), *Variant Construction from Theoretical Foundation to Applications*, https://doi.org/10.1007/978-981-13-2282-2\_22

Key Laboratory of Quantum Information of Yunnan, Yunnan University, Kunming, China e-mail: conjugatelogic@yahoo.com

In network security applications, the stream ciphers play a key role that have faster throughput and easier to implement compared to block ciphers [2]. RC4, the famous stream cipher, is suitable for large packets in Wireless LANs [3]. It has been used for encrypting the internet traffic in network protocols such as Sockets Layer (SSL), Transport Layer Security (TLS), Wi-Fi Protected Access (WPA), etc. [2].

eSTREAM project collected stream ciphers from international cryptology society [4] to promote the design of efficient and compact stream ciphers suitable for widespread adoptions. After a series of tests, algorithms submitted to eSTREAM are selected into two profiles. One is more suitable for software and another one is more suitable for hardware. Non-linear structures and recursive are playing the essential roles in new development.

Different visual schemes are required to test randomness of random sequences on different stream ciphers. Along this direction, this chapter proposes a flexible framework to handle a set of mete measurements on different combinatorial projections.

#### **2 Variant Combinatorial Visualization**

Architecture of variant visualization is shown in Fig. 1.

The variant visualization architecture is separated into four core components: EAC, SCC CC and VC.


CC Combinatorial Component VC Visualization Component

**Fig. 1** Visualization architecture

The input *n* is the length of the binary sequence. The stream ciphers could be changed to any stream cipher that can generate binary sequence. This section focuses on the variant measure module and the visual method module.

A visual example of RC4 will be described in Sect. 2.5.

#### *2.1 Variant Logic Framework*

The variant logic framework has been proposed in [6]. Li [7] used the variant measure method to generate different symmetry results [5] based on cellular automata schemes [8]. Under such construction, even some random sequences show symmetry properties in distributions.

Under variant construction, the variant conversion operator can be defined as follows:

$$C(\mathbf{x}, \mathbf{y}) = \begin{cases} \bot, \mathbf{x} = \mathbf{0}, \mathbf{y} = \mathbf{0} \\ \mathsf{+}, \mathsf{x} = \mathbf{0}, \mathbf{y} = \mathbf{1} \\ \mathsf{-}, \mathsf{x} = \mathsf{1}, \mathbf{y} = \mathbf{0} \\ \mathsf{T}, \mathsf{x} = \mathsf{1}, \mathbf{y} = \mathsf{1} \end{cases} \tag{1}$$

It is convenient to list relevant variant logic variables shown in Table 1.

In the variant measure method, each sequence is converting from binary sequence to probability which generated by counting the number of each variable in {⊥*,* +*,* −*,* } and computes the probability of each variable. The measurement method is shown in Table 1.


**Table 1** The variant measure method

The variant measure method provides a set of results in measures of different 0–1 sequences. The following mechanism can transfer stream cipher sequences as relevant measures.

The essential models of variant scheme are described as follows.

#### *2.2 VSC Variant Statistic Component*

The VSC component converts the binary sequence to variant sequence in VCM module, and to compute probabilities and entropies in PECM module, respectively. The component is shown in Fig. 2.

#### **VCM Variant Conversion Module**

VCM module transfers input binary sequences by following steps:


$$\begin{aligned} G &= \{G\_1, G\_2, \dots, G\_{n/N}\} \\ &= \{\{V\_1, V\_2, \dots, V\_N\}, \dots, \{V\_{n-N}, V\_{n-N+1}, \dots, V\_n\}\} \end{aligned}$$

Step 5. Separate each item in *G* into *N/M* parts to establish a sequence group

$$G = \{ \{ \{V\_1, \dots, V\_M\}, \dots, \{V\_{N-M+1}, \dots, V\_N\} \}, \dots, \}$$

$$\{ \{V\_{n-N}, \dots, V\_{n-N+M}\}, \dots, \{V\_{n-M}, \dots, V\_n\} \}$$

PECM Probability and Entropy Computing Module

#### **PECM Probability and Entropy Computing Module**

PECM converts a variant sequences group to separate it into several parts to compute probability and entropies. The equations computing the parameters have been described in Table 1. The main steps are performed as follows:


#### *2.3 CC Combinatorial Component*

IIn the CC component, it can be separated into two modules. One is SM module to form the vector selecting and another one is VDM module to perform the visualization.

Visual data is a set of *E* vectors as input for VC. For *E* vector, choose a projection as a visual vector to compute the visual result from *E* vectors. So there will be 16 visual results.

Base on the same number of variables in a combination, the combination set can be integrated into 5 parts. i.e. The selected number of variables in the combination is in 0-4.

Let the classification be *EC* - {*EC*0*, EC*1*, EC*2*, EC*3*, EC*4}. Since the *EC*<sup>0</sup> is empty, it can be ignored. Only four distributions are of concern in Sect. 2.4.

#### *2.4 Visualization Component*

According to the variant measure method, in the rectangular axis, let *E*<sup>⊥</sup> be the positive axis of *X*, *E* be the negative axis of *X*, *E*<sup>+</sup> the positive axis of *Y*, *E*<sup>−</sup> be the negative axis of *Y*. The axis is shown in Fig. 3.

For *EC*<sup>1</sup> -{{*E*⊥}*,*{*E*+}*,*{*E*−}*,*{*E*}}, points are distributed to the axis.

For *EC*<sup>2</sup> - {{*E*⊥*, E*+}*,*{*E*⊥*, E*−}*,*{*E*⊥*, E*}*,*{*E*+*, E*−}*,*{*E*+*, E*}*,*{*E*−*, E*}}, points are distributed in the shadow area in Fig. 4.

For *EC*<sup>3</sup> - {{*E*⊥*, E*+*, E*−}*,*{*E*⊥*, E*+*, E*}*,*{*E*⊥*, E*−*, E*}*,*{*E*+*, E*−*, E*}}, points are distributed in the area of *EC*<sup>1</sup> and the area of *EC*2.

For *EC*<sup>4</sup> -{{*E*⊥*, E*+*, E*−*, E*}}, points are distributed in Fig. 5.

#### **Fig. 3** Visualization axis

**Fig. 4** Distribution areas of *EC*<sup>2</sup>

**Fig. 5** Distribution areas of *EC*4

#### *2.5 Example*

An example is given step by step to show how the algorithm runs. In the example, *n*, *N* and *M* are, respectively, assigned to 40, 16 and 8.


$$\begin{cases} \mathbf{D}\_{\perp} = \{ \mathbf{P}\_{0.125} = \mathbf{1} & \mathbf{J} \\ \qquad \qquad \vdots \\ \qquad \qquad \vdots \\ \mathbf{D}\_{\tau} = \{ \mathbf{P}\_{0.25} = \mathbf{0}.5, \mathbf{P}\_{0.725} = \mathbf{0}.5 \} \end{cases}$$

**Fig. 6** *D* vectors of {+ + − + ⊥+*,* − − ⊥ + −}

$$\begin{cases} \mathbf{E}\_{\perp} = - (\mathbf{P}\_{0.125} \log \mathbf{P}\_{0.125} & ) = 0.0 \\ \vdots & \vdots \\ \mathbf{E}\_{\top} = - (\mathbf{P}\_{0.25} \log \mathbf{P}\_{0.25} + \mathbf{P}\_{0.725} \log \mathbf{P}\_{0.725}) = 0.693147 \end{cases}$$

**Fig. 7** *E* vectors of {+ + − + ⊥ + − − ⊥ + −}

**Fig. 8** Visual result of the example


## **3 Result**

#### *3.1 Visual Result of RC4*

The initial: {**n** : 128*,*000*,* **N** : 128*,* **M** : 16} The visual result (Fig. 9). The initial: {**n** : 128*,*000*,* **N** : 128*,* **M** : 24} The visual result (Fig. 10). The initial: {**n** : 128*,*000*,* **N** : 1000*,* **M** : 8} The visual result (Fig. 11). The initial: {**n** : 100*,*000*,* **N** : 100*,* **M** : 24} The visual result (Fig. 12).

**Fig. 9** Visual result of RC4 {**n** : 128000*,* **N** : 128*,* **M** : 16}

#### *3.2 Visual Result of HC256*

The initial: {**n** : 128*,*000*,* **N** : 128*,* **M** : 16} The visual result (Fig. 13). The initial: {**n** : 128*,*000*,* **N** : 128*,* **M** : 24} The visual result (Fig. 14). The initial: {**n** : 100*,*000*,* **N** : 100*,* **M** : 8} The visual result (Fig. 15). The initial: {**n** : 100*,*000*,* **N** : 100*,* **M** : 16} The visual result: (Fig. 16).

**Fig. 10** Visual result of RC4 {**n** : 128000*,* **N** : 128*,* **M** : 24}

**Fig. 11** Visual result of RC4 {**n** : 128000*,* **N** : 1000*,* **M** : 8}

**Fig. 12** Visual result of RC4 {**n** : 100000*,* **N** : 100*,* **M** : 24}

**Fig. 13** Visual result of HC256 {**n** : 128000*,* **N** : 128*,* **M** : 16}

**Fig. 14** Visual result of HC256 {**n** : 128000*,* **N** : 128*,* **M** : 24}

**Fig. 15** Visual result of HC256 {**n** : 100000*,* **N** : 100*,* **M** : 8}

**Fig. 16** Visual result of HC256 {**n** : 100000*,* **N** : 100*,* **M** : 16}

# **4 Conclusion**

The visual results show the similar symmetry property of sequences generated by RC4 and HC256. They are showing interesting distributions and can be significantly distinguished from their combinatorial maps. From our models and illustrations, various maps can be integrated by their combinatorial projections to show different spatial distributions on random sequences. Under this configuration, the variant measure method provides a new analysis tool for stream cipher applications in further explorations.

**Acknowledgements** This work was supported by the Key Project on Electric Information and Next Generation IT Technology of Yunnan (2018ZI002), NSF of China (61362014) and Yunnan Advanced Overseas Scholar Project.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Part VIII Applications—DNA Sequences

Random numbers should not be generated with a method chosen at random.

—Donald Knuth

Natural selection is anything but random.

—Richard Dawkins

Biology is the most powerful technology ever created. DNA is software, proteins are hardware, cells are factories.

—Arvind Gupta

Initial approaches of variant construction on DNA sequences were developed from 2012. For example, Randomness Measurement of Pseudorandom Sequence Using different Generation Mechanisms and DNA Sequence. Journal of Chengdu University of Information Technology. 27(6): 548–555, 2012; 2D Conjugate Maps of DNA Sequences, Journal of Information Security Vol. 4 No. 4 (2013), https:// doi.org/10.4236/jis.2013.44021; Pseudo DNA Sequence Generation of Non Coding Distributions Using Variant Maps on Cellular Automata. Applied Mathematics 5: 153–174, 2014; Variant Map Construction to Detect Symmetric Properties of Genomes on 2D Distributions. J Data Mining Genomics Proteomics 5:150, 2014; Variant Maps to Identify Coding and Non-coding DNA Sequences of Genomes Selected from Multiple Species, Biol Syst Open Access 2016, 5:1. https://doi.org/ 10.4172/2329-6577.1000153 and Mapping Whole DNA Sequence on Variant Maps, Asunam 2017: 1037–1040. https://doi.org/10.1145/3110025.3110140.

This direction contains extensive results among various applications.

This part of DNA sequences is composed of two chapters (23 and 24).

Chapter "Variant Map System to Simulate Complex Properties of DNA Interactions Using Binary Sequences" describes to use binary sequences to simulate DNA interactions under four meta basis. Different stream ciphers and real DNA sequences are applied in comparison. Their maps are illustrated similarity and differences among selected sequences.

Chapter "Whole DNA Sequences of Cebus capucinus on Variant Maps" applies whole DNA sequences of Cebus Capucinus (White Face Monkey) on variant maps. This set of maps has shown in various distributions of complex characteristics. Further researches are required.

# **Variant Map System to Simulate Complex Properties of DNA Interactions Using Binary Sequences**

**Jeffrey Zheng, Weiqiong Zhang, Jin Luo, Wei Zhou and Ruoyu Shen**

**Abstract** Stream cipher, DNA cryptography and DNA analysis are the most important R&D fields in both Cryptography and Bioinformatics. HC-256 is an emerged scheme as the new generation of stream ciphers for advanced network security. From a random sequencing viewpoint, both sequences of HC-256 and real DNA data may have intrinsic pseudo-random properties respectively. In a recent decade, many DNA sequencing projects are developed on cells, plants and animals over the world into huge DNA databases. Researchers notice that mammalian genomes encode thousands of large noncoding RNAs (lncRNAs), interact with chromatin regulatory complexes, and are thought to play a role in localizing these complexes to target loci across the genome. It is a challenge target using higher dimensional visualization tools to organize various complex interactive properties as visual maps. The Variant Map System VMS as an emerging scheme is systematically proposed in this chapter to apply multiple maps that uses four Meta symbols as same as DNA or RNA representations. System architecture of key components and core mechanism on the VMS are described. Key modules, equations and their I/O parameters are discussed. Applying the VM System, two sets of real DNA sequences from both sample human (noncoding DNA) and corn (coding DNA) genomes are collected in comparison with pseudo DNA sequences generated by HC-256 to show their intrinsic properties in higher levels of similar relationships among relevant DNA sequences on 2D maps. Sample 2D maps are listed and their characteristics are illustrated under controllable environment. Visual results are briefly analyzed to explore their intrinsic properties on selected genome sequences.

J. Zheng (B)

W. Zhang

School of Software and Microelectronics, Peking University, Beijing, China

J. Luo School of Life Sciences, Yunnan University, Kunming, China

W. Zhou · R. Shen School of Software, Yunnan University, Kunming, China

© The Author(s) 2019 J. Zheng (ed.), *Variant Construction from Theoretical Foundation to Applications*, https://doi.org/10.1007/978-981-13-2282-2\_23

Key Laboratory of Yunnan Software Engineering, Yunnan University, Kunming, China e-mail: conjugatelogic@yahoo.com

**Keywords** Pseudo-random number generator · Stream cipher · HC-256 Binary to DNA · Pseudo DNA sequence · Large noncoding · DNA analysis 2D map · Visual distribution · Variant map system

#### **1 Introduction**

Stream ciphers [1, 2] play a key role in modern network security [3, 4] especially in multimedia network environments; its core component—pseudo random number generation mechanism [5–7]—takes the central position in modern cryptography [8, 9]. Associated with advanced development of bioinformatics, advanced DNA sequencing and analyzing techniques [10, 11] have significantly progressed over the past decade.

#### *1.1 DNA Cryptography*

DNA cryptography makes joined research in the field of DNA computing and cryptography. Scholars over the world focused on this field and different results are published such as simulating DNA evolution [12], DNA pseudorandom number generator [13–16], DNA cryptography [9, 17, 18] and so on. However in current situation, DNA cryptography is still at an earlier stage as an emerging area of advanced cryptography.

In typical results of DNA cryptography on encryption, different coding schemes could be randomly selected. E.g. the algorithm in paper [17] applies an encoding formula to express the plaintext on DNA sequence: {00→*C*, 01→*T*, 10→*A*, 11→*G*}; however in paper [18], the same author uses the coding formula {00→*A*, 01→*T*, 10→*C*, 11→*G*} for the plaintext on DNA sequence. In encryption environment, all 4!24 possible encoding methods could be equally used in different applications.

#### *1.2 Stream Cipher HC-256*

Stream ciphers are an important class of encryption algorithms. A stream cipher is a symmetric cipher which operates with a time-varying transformation on individual plaintext digits. The ECRYPT Stream Cipher Project (eSTREAM) [1] was a multi-year effort, running from 2004 to 2008, to promote the design of efficient and compact stream ciphers suitable for widespread adoption. **HC-256** is a stream cipher designed to provide bulk encryption in software at high speeds while permitting strong confidence in its security. A 128-bit variant was submitted in 2004 as an eSTREAM cipher candidate; it has been selected as one of the four final contestants in the software profile [2, 4] in 2008 as the most advanced scheme for stream cipher applications in advanced network environment.

#### *1.3 Large Noncoding DNA and RNA*

In relation to DNA analysis, visualization methods play a key role in the Human Genome Project (HGP) [19]. After HGP completed successfully, a public research consortium—the Encyclopedia of DNA Elements (ENCODE) were launched by the National Human Genome Research Institute (NHGRI) in 2003 to find all functional elements in the human genome as one of the most critical projects by NHGRI to explore genomes after HGP.

In 2012, ENCODE released a coordinated set of 30 papers published in key Journals of Nature, Genome Biology and Genome Research. These publications show that approximately 20% of noncoding DNA in the human genome is functional while an additional 60% is transcribed with no known function [20]. Much of this functional non-coding DNA is involved in the regulation of the expression of coding genes [10]. Furthermore the expression of each coding gene is controlled by multiple regulatory sites located both near and distant from the gene. These results demonstrate that gene regulation is far more complex than was previously believed [11]. Mammalian genomes encode thousands of large noncoding RNAs (lncRNAs), many of which regulate gene expression, interact with chromatin regulatory complexes, and are thought to play a role in localizing these complexes to target loci across the genome [21]. Associated with different international projects, larger numbers of Genome Databases are established and mass Genome-wide gene expression measurements are developed.

Due to huge amount of DNA sample collections and extremely difficulties to determine their variation properties in wider applications [19, 22–27], it is essential for us to extend advanced DNA analysis models, methods and tools in further extensions to explore emerging models and concepts to interpret complex interactions among complicated sets of DNA sequences in real environments.

#### *1.4 DNA Analysis*

DNA analysis plays a key role in modern genomic application [19]. The HGP is heavily relevant to advanced DNA sequencing and analysis techniques. DNA sequences are composed of four Meta symbols on {*A, T, G, C*} as basic structure. Classical DNA double helix structure makes the first level of pair construction of DNA sequences with A & T and G & C complementary structures as the first level of symmetric relationships. A typical DNA sequencing result is shown in Fig. 1a. Four Meta symbols could be separated as four projective sequences.

In ENCODE, recent Genomic analysis results are indicated that encoded sequences have only 20% in human genomes and around 80% genomes look like useless sequences. Under further assumptions, it seems that additional symmetric properties are required to satisfy the second, third and higher levels of structural constructions to explore complex interactive properties [10, 11, 19–29].

In current situation, it is necessary for advanced researchers to shift targets in computational cell biology from directly collecting sequential data to making higherlevel interpretation and exploring efficient content-based retrieval mechanism for genomes. Using higher dimensional visualization tools, their complex interactive properties could be organized as different visual maps systematically.

#### *1.5 Variant Construction and DNA*

Variant construction is a new structure composed of logic, measurement and visualization models to analyze 0–1 sequences under variant conditions. The further details of this construction can be checked on variant logic [30, 31], 2D maps [32, 33], variant pseudo-random number generator [34], DNA maps [35] and variant phase spaces [33]. Since the variant system uses another set of four Meta symbols {⊥, +, −, } to describe system, a typical correspondence shown in Fig. 1b may provides a natural mapping between DNA and variant data sequences.

Since DNA sequences are played an essential role to explore different symmetric properties based on analysis approaches, in this chapter, measurement and visual models are proposed systematically to use a fixed segment structure to measure four Meta symbols distributions in their spectrum construction. Under this construction, refined symmetric features can be identified from various polarized distributions and further symmetric properties are visualized.

#### *1.6 Target of This Chapter*

The target of this chapter is to establish the Variant Map System (VMS) as a unified framework to analyze complex DNA interactions on both artificial and natural DNA sequences. The VMS has designed to use variant logic schemes [30–35] applying multiple maps on four Meta symbols as DNA or RNA representations. System architecture of key components and core mechanism on the VMS are described. Key modules, equations and their I/O parameters are discussed. Applying the VM System, two sets of real DNA sequences from both human (noncoding DNA) and corn (coding DNA) genomes are collected in comparison with pseudo DNA sequences generated artificially by HC-256 to show their intrinsic properties in higher levels of similar relationships among DNA sequences on 2D maps. Further descriptions and discussions are provided respectively.

#### **2 System Architecture**

In this section, system architecture and their core components are discussed with the use of diagrams. The refined definitions and equations of this system are described in the next section—Variant Map System.

#### *2.1 Architecture*

The four components of a variant map system are the Binary To DNA (BTD), the Binary Probability Measurement (BPM), the Mapping Position (MP), and the Visual Map (VM) as shown in Fig. 2.

The architecture is shown in Fig. 2a with the key modules of the four core components being shown in Fig. 2b–e respectively.

In the first part of the system, the *t*-th sequence *Y <sup>t</sup>* on either {0, 1} or {*A, G, T, C*} are input data to get into the BTD module. The main function of the BTM is to output a unified sequence *X<sup>t</sup>* either to transfer a 0–1 sequence or to keep a DNA sequence as a pseudo or pure DNA sequence under a set of controlled parameters.

Using this unified DNA sequence, four vectors of probability measurements are created from the *t*-th selected DNA sequence with *Nt* elements as an input. Multiple segments are partitioned by a fixed number of n elements for each segment; at least *mt* segments can be identified by the BPM component. Next component uses the four vectors of probability measurements and a given *k* value as input data, a pair of position values are created for each Meta symbol. Four pairs of values are generated by the MP component. Then, in order to process multiple selected DNA sequences, all selected sequences are processed by the VM component and each sequence may pro-


$$\uparrow^{\times^t} \xrightarrow[\text{n.v}]{} \boxed{\begin{array}{c} \text{BM} \\ \text{RM} \end{array}} \xrightarrow[\text{P}]{\{\mathcal{M}^{\ddagger}\_{\text{V}}\}} \boxed{\begin{array}{c} \text{PM} \\ \text{P} \end{array}} \xrightarrow{\text{P}} \{\phi^{\text{V}}\_{\text{I}}\}\_{\text{o}\in\text{l}\times\text{m}\_{\text{t}}},$$

**Fig. 2** Variant Map System VMS and key components **a** Architecture; **a** BTD component; **b** BPM component; **c** MP component; **d** VM component

vide a set of pair values to generate relevant variant maps to indicate their distribution properties respectively.

With eight parameters in an input group, there are three sets of parameters in the intermediate group and one set of parameters in the output group.

The three groups of parameters are listed as follows.

#### **Input Group**:


#### **Intermediate Group**:


#### **Output Group**:


## *2.2 BTD Binary to DNA*

The BTD component shown in Fig. 2b is composed of one module: BTD itself. Five parameters are shown as input signals and one unified vector is generated by the BTD component as the output group.

#### **Input Group**:


#### **Output Group**:

*X<sup>t</sup>* A unified data vector with *Nt* elements, *X<sup>t</sup>* ∈ *DNt*

The BTD component uses an input vector on either binary or DNA format as input, under a set of input parameters to process transformation. The output of the BTD component is composed of a unified vector of DNA format in a given condition.

#### *2.3 BPM Binary Probability Measurement*

The BPM component shown in Fig. 2c is composed of two modules: BM Binary Measure and PM Probability Measurement. Three parameters are listed as input signals; four vectors of binary measures are outputted from the BM component as an intermediate group and four sets of probability measurements are outputted as an output group.

#### **Input Group**:


#### **Intermediate Group**:


#### **Output Group**:


The BPM component transforms a selected DNA sequence to generate four 0–1 vectors by BM module for the input DNA sequence. Then four probability vectors are generated by the PM module as the output of the BPM under a fixed length of segment condition.

#### *2.4 MP Mapping Position*

The MP component shown in Fig. 2d is composed of three modules: HIS Histogram, NH Normalized Histogram and PP Pair Position. Two parameters are listed as input signals; four histograms and four normalized histograms are generated from the HIS component and the NH component as intermediate groups respectively. Four paired values are generated by the PP component as the output group.

#### **Input Group**:


*k* An integer indicates the control parameter for mapping, *k* > 0

#### **Intermediate Group**:


#### **Output Group**:

*x<sup>k</sup> <sup>V</sup>* , *y<sup>k</sup> V* Four paired values, *k* > 0, *V* ∈ *D*

The MP component uses probability measurements as input, under a given k condition to generate each relevant histogram and its normalized distribution. The output of the MP component is composed of four paired values controlled in a given condition.

#### *2.5 VM Visual Map*

The VM component shown in Fig. 2e is composed of one module: VM Visual Map. Three parameters are input signals. Collected all selected DNA sequences, four 2D maps are generated by the VM component as the output result.

#### **Input Group**:

∀*t* All DNA sequences are selected, 0 ≤ *t* < *T Y <sup>t</sup>* An input data vector with *Nt* elements, *Y <sup>t</sup>* ∈ - *DNt* | mod *<sup>e</sup>*<sup>0</sup>, *BNt* | mod *<sup>e</sup>*<sup>1</sup> *x<sup>k</sup> <sup>V</sup>* , *y<sup>k</sup> <sup>t</sup>* Four paired values for the *<sup>t</sup>*-th DNA sequence, *<sup>k</sup>* <sup>&</sup>gt; <sup>0</sup>, *<sup>V</sup>* <sup>∈</sup> *<sup>D</sup>*

#### *V* **Output Group**:


The VM component processes all selected DNA sequences as input to generate paired values for each sequence. The output of the VM component is composed of four 2D maps to show the final visual distribution for the system.

#### **3 Variant Map System**

In this section, definitions and equations are provided to describe the VMS. In addition to the initial preparation, seven core modules are involved in the BTD, BM, PM, HIS, NH, PP and VM components respectively.

#### *3.1 Initial Preparation*

Let *r* an input parameter make all pairs of elements with r distance in a binary sequence to be a pseudo DNA vector, mode a controlled parameter indicate various pairs of operations performed if mode ≥ 1. Denote *B* {0, 1} a binary base and *D* {*A*, *G*, *T*,*C*} a DNA base respectively.

#### *3.2 BTD Module*

Let *Y* an input sequence with N elements,0 ≤ *I* < *N*, *Y* (*I*) ∈ {*B<sup>N</sup>* | mod *<sup>e</sup>*≥1, *Y* (*I*) ∈ *D<sup>N</sup>* | mod *<sup>e</sup>*0}. This input vector could be expressed as follows.

$$Y = (Y(0), \dots, Y(I), \dots, Y(N-1)), \ 0 \le I < N$$

$$Y(I) \in \{\mathcal{B}^N|\_{\text{mode}\ge 1}, \ Y(I) \in \mathcal{D}^N|\_{\text{mode}=0}\}. \tag{1}$$

Let *X* denote a DNA sequence with *N* elements, *D* denote a symbol set with four elements i.e. *D* {*A*, *G*, *T*,*C*}. This type of a DNA sequence can be described by a four valued vector as follows:

$$\begin{aligned} X &= (X(0), \dots, X(I), \dots, X(N-1)), \\ 0 &\le I < N, X(I) \in D = \{A, G, T, C\}, X \in D^N \end{aligned} \tag{2}$$

From this input and associated parameters, following operations are performed. If mode0, for all *I*, *Y* (*I*) ∈ *D*, the output vector is equal to the input vector.

$$\forall I, X(I) = Y(I), 0 \le I < N \tag{3}$$

If mode1, for all pairs of *I* and *I* +*r*(mod*N*) elements of *Y*, *Y*(*I*), *Y* (*I* + *r*) ∈ *B*, the *I*-th output element *X*(*I*) can be determined by the corresponding conditions shown in Fig. 1b as follows.

$$\mathbf{X}(\mathbf{I}) = \begin{cases} \mathbf{G}, & \text{if } \mathbf{Y}(\mathbf{I}) = 0 \& \text{ } \mathbf{Y}(\mathbf{I} + \mathbf{r}) = \mathbf{0} \\ \mathbf{A}, & \text{if } \mathbf{Y}(\mathbf{I}) = 0 \& \text{ } \mathbf{Y}(\mathbf{I} + \mathbf{r}) = \mathbf{1} \\ \mathbf{T}, & \text{if } \mathbf{Y}(\mathbf{I}) = \mathbf{1} \& \text{ } \mathbf{Y}(\mathbf{I} + \mathbf{r}) = \mathbf{0} \\ \mathbf{C}, & \text{if } \mathbf{Y}(\mathbf{I}) = \mathbf{1} \& \text{ } \mathbf{Y}(\mathbf{I} + \mathbf{r}) = \mathbf{1} \end{cases} \tag{4}$$

In both conditions, *X* will be a unified vector with four values as the output of the BTD shown in Fig. 2b.

E.g. Let a binary sequence *Y* 100111001011, *N* 12, three pseudo DNA sequences (*r* 1,*r* 2,*r* 3) can be represented as follows.

$$Y = 100111001011$$

$$\begin{aligned} X\_{r=1} &= TGACCTGATACC\\ X\_{r=2} &= TAACTTAGACT\\ X\_{r=3} &= CAATTCGACATT\\ Y &\in \mathcal{B}^{12}, X \in D^{12} \end{aligned}$$

Selecting a certain *r* value, a relevant pseudo DNA sequence can be generated from an input binary sequence.

#### *3.3 BM Module*

For a given *I*-th element, four projective operators can be defined and denoted as {*MA*(*I*), *MG*(*I*), *MT* (*I*), *MC*(*I*)}.

$$\begin{aligned} M\_A(I) &= \begin{cases} 1, \text{ if } X(I) = A; \\ 0, \text{Otherwise;} \end{cases} \\ &= \begin{cases} 1, \text{ if } X(I) = T; \\ 0, \text{ otherwise;} \end{cases} \end{aligned} \quad \begin{aligned} M\_G(I) &= G; \\ 0, \text{ otherwise;} \end{aligned} \quad M\_T(I) \\ &= \begin{cases} 1, \text{ if } X(I) = T; \\ 0, \text{ otherwise;} \end{cases} \end{aligned}$$

Applying the four operators to all elements, the DNA sequence *X* can be reorganized into the four binary sequences of 0–1 values. i.e.

$$\begin{aligned} M\_V: \{X(I)\}\_{I=0}^{N-1} &\to \{M\_A(I), M\_G(I), M\_T(I), M\_C(I), \}\_{I=0}^{N-1};\\ M\_V(I) \in B &= \{0, 1\}, V \in D \end{aligned} \tag{6}$$

E.g. Let a DNA sequence *X CT G AT T AGCC AT*, *N* 12, its four binary sequences can be represented as follows.

$$\begin{aligned} X &= CTGATTAGCCAT\\ M\_A &= 000100100010\\ M\_G &= 001000010000\\ M\_T &= 010011000001\\ M\_C &= 100000001100 \end{aligned}$$

It is interesting to notice that the basic relationship between a DNA sequence *X* and its four *MV* sequences are exactly same as in a modern DNA sequencing procedure to separate a selected DNA sequence into the four Meta symbol sequences shown in Fig. 1a. This correspondence could be the key feature to apply the proposed scheme naturally in simulating complex behaviors for any DNA sequence.

The projection *MV* provides the essential operation in the BM component as the first module shown in Fig. 2c.

#### *3.4 PM Module*

For this set of the four binary sequences, it is convenient to partition them into m segments and each segment contained a fixed number of n elements.

For the *l*-th segment, let 0 ≤ l < *m*, 0 ≤ *j* < *n*, the *I*-th position will be *I l* ∗ *n* + *j*, four probability measurements {ρ*A*, ρ*G*, ρ*<sup>T</sup>* , ρ*C*,} can be defined.

$$\rho\_l^V = \frac{\sum\_{I=l\ast n}^{(l+1)\ast n-1} M\_V(I)}{n}, V \in D, 0 \le I < N = n \ast m \tag{7}$$

Under this construction, four sets of probability measurements established.

$$\rho^{\mathcal{V}}: \{M\_A(I), M\_G(I), M\_T(I), M\_C(I), \}\_{I=0}^{N-1} \to \quad \{\rho\_l^A, \rho\_l^G, \rho\_l^T, \rho\_l^C, \}\_{l=0}^{m-1} \tag{8}$$

The probability operator ρ*<sup>V</sup>* generates four probability measurement vectors in the PM component as the second module shown in Fig. 2c. After the BM and PM processes, the whole procedure of the BPM component is complete in Fig. 2c.

#### *3.5 HIS Module*

Since the BPM generates four sets of probability measurement, it is necessary to perform further operations in the MP component shown in Fig. 2d as follows.

In the HIS component as the first module in Fig. 2d, each probability sequence - ρ*V l* , *m*−<sup>1</sup> *<sup>l</sup>*<sup>0</sup> , *<sup>V</sup>* <sup>∈</sup> *<sup>D</sup>* can be calculated from n positions, at most *<sup>n</sup>* + 1 distinguished values identified in a vector. Under this organization, a histogram distribution can be established.

Let *H*(.) be a histogram operator, for each position, it satisfies following relation,

$$H(\rho\_l^V) = \begin{cases} 1, \text{ if } \rho\_l^V = \frac{i}{n}, \text{ } V \in D; \\ 0, \text{ Otherwise, } 0 \le i \le n. \end{cases} \tag{9}$$

Collecting all possible values, a histogram distribution can be established,

$$H(\boldsymbol{\rho}^V) = \sum\_{l=0}^{m-1} H(\boldsymbol{\rho}\_l^V) \tag{10}$$

The histogram *H* ρ*<sup>V</sup>* is the output of the HIS module. Four histograms are generated after HIS process. Further normalized process will be performed in the NH component as the second module in Fig. 2d.

#### *3.6 NH Module*

Under this construction, a normalized histogram can be defined as

$$P\_H\left(\boldsymbol{\rho}^V\right) = H\left(\boldsymbol{\rho}^V\right) / m \tag{11}$$

After the NH component processed, its output provides the *PP* component for further operations as the third module in Fig. 2d.

#### *3.7 PP Module*

Relevant probability vectors have (*n* + 1) distinguished values; four sets of normalized vectors can be organized as a linear order as follows,

$$\lfloor p\_i^V = \sum\_{l=0}^{m-1} H\left(\rho\_l^V \lfloor \rho\_l^V = \frac{i}{n}\right) / m, \ 0 \le i \le n \tag{12}$$

Under this condition, four linear sets of probability vectors are established,

$$\begin{aligned} P\_H(\boldsymbol{\rho}^V) &= \{p\_i^A, p\_i^G, p\_i^T, p\_i^C, \}\_{i=0}^n, \\ p\_i^V &\in [0, 1], \; V \in D, \; 0 \le i \le n \end{aligned} \tag{13}$$

For four vectors, their components can be normalized respectively,

$$\sum\_{i=0}^{n} p\_i^V = 1, \ V \in D \tag{14}$$

Four sets of probability vectors are composed of a complete partition on their measurements.

Using this set of measurements, two mapping functions can be established to calculate a pair of values to map analyzed DNA sequence into a 2D map as follows.

Let *y F*(*P*, *V*, *k*) and *x F*(*P*, *V*, 1/*k*) or *xk <sup>V</sup>* , *y<sup>k</sup> V* be a pair of values defined by following equations,

$$\mathbf{y}\_V^k = F(P, V, k) = \left(\sum\_{i=0}^n \sqrt[k]{p\_i^V}\right)^k \mathbf{k}$$

$$\mathbf{x}\_V^k = F(P, V, 1/k) = \sqrt[k]{\sum\_{i=0}^n \left(p\_i^V\right)^k}, V \in D \tag{15}$$

In the *PP* component, four paired values are generated and each pair indicates a specific position on a 2D map for the selected DNA sequence. The core operations of three key components: BTD, BPM and MP for a selected sequence are performed in Fig. 2b–d.

#### *3.8 VM Module*

Since only one point of a 2D map is determined for a selected DNA sequence, it is essential to apply relative larger number of DNA sequences as inputs to generate visible distributions. This type of operations will be performed in the VM component shown in Fig. 2e.

In a general condition, the VM component processes a selected data set - *Y t T*−<sup>1</sup> *t*0 composed of T sequences, the t-th sequence with *Nt* elements can be expressed by *Y <sup>t</sup> Y t* (0),..., *Y <sup>t</sup>* (*I*),..., *Y <sup>t</sup>* (*Nt* − 1) , *Y <sup>t</sup>* ∈ *Y* (*I*) ∈ {*BNt* |mode <sup>≥</sup> 1, *Y* (*I*) ∈ *DNt* |mode<sup>0</sup>}. Each sequence can be processed to apply the same procedures of the BTD, BPM and MP components. Since for each segment, its length n will be fixed for all selected sequences, it is essential to make number of segments be *m<sup>t</sup> Nt*/*n* in convention to match each sequence. Under this expression, the last module VM collects all *T* pairs of positions on relevant 2D visual maps as follows,

$$\left\{\mathbf{VM}: \left\{X^{t}\right\}\_{t=0}^{T-1} \to \left\{\left(\mathbf{x}\_{V}^{k}, \mathbf{y}\_{V}^{k}\right)^{t}\right\}\_{t=0}^{T-1} \to \left\{\mathbf{MAP}\_{V}\right\}, V \in D\tag{16}$$

A sample 2D map of VM is shown in Fig. 3; this provides an assistant illustration for this type of visual maps on a case of multiple sequences.

Under this construction, a total number of T DNA sequences are transformed as T visual points on four 2D visual maps that would be help analyzers to explore their intrinsic symmetry properties among four binary sequences.

#### **4 Sample Results on 2D Maps**

Two types of data sets are selected for comparison. The first type of data sets are real DNA data sequences collected from both human and plan genomes to illustrate their differences on 2D maps. The second type of data set is collected from the Stream Cipher HC-256 to generate a pseudo random binary sequence under a certain condition.

#### *4.1 DNA Data Resources*

It is important to use some real DNA sequences to illustrate various test results of the VMS. Two sets of DNA sequences are selected and relevant resource features are described as follows.

The first data set originally comes from the human genome assembly version 37 and was taken from the reference sequences of 13 anonymous volunteers from Buffalo, New York. Hi-C technology [5] used to analyze chromatin interaction role in genome. From a genomic analysis viewpoint, this set of data may contain more complex secondary or higher level structures. A special structure nearly the GRCh37 DNA sequence has been identified to explore their spatial characteristics. After positive and negative sequencing, each data file contain 2700 DNA sequences and each sequence has around 500 elements stored in two files *left* and *right* respectively.

The second DNA data set are selected from some plant gene database for comparison. One set of DNA sequences of Corn genomes are stored in file 201–500 that contains 2700 DNA sequences and each sequence has around 200*–*600 elements. It may be ordinary single sequences without complex secondary structures.

#### *4.2 Pseudo DNA Data Resources*

The Stream Cipher HC-256 has being used to generate a binary sequence on a total length of 2700 × 500 bits in the file *hc256* that has been partitioned as 2700 subsequences and each sub-sequence in 500 bits.

Using the VMS in various parameters, three sets of pseudo DNA sequences are generated and their 2D maps are illustrated, analyzed and compared in following subsections.

#### *4.3 Sample Results*

Using the three files of DNA sequences and one pseudo binary sequence in three parameters, six sets of 2D maps are listed in Figs. 4, 5, 6, 7, 8 and 9 under different conditions to illustrate their spatial distributions using the VMS in a controllable environment.

In Fig. 4, three groups of eighteen 2D maps are shown in the range of *n* 3 ∼ 50, *k* 7, *N* ∼ 200 ∼ 600, *T* 2700 for comparison; (a1–a6) six Map*<sup>A</sup>* maps for the file *Right*; (b1–b6) six Map*<sup>G</sup>* maps for the file 201–500; (c1–c6) six MapA maps for the file *hc256* respectively.

In Fig. 5, four groups of sixteen 2D maps for the file *right* are listed in the range of *n* 15, *k* {2, 3, 4, 7}, *N* ∼ 500, *T* 2700; (a) group (a1–a4) four Map*<sup>A</sup>* maps; (b) group (b1–b4) four Map*<sup>T</sup>* maps; (c) group (c1–c4) four Map*<sup>G</sup>* maps; (d) group (d1–d4) four Map*<sup>C</sup>* maps.

In Fig. 6, four groups of sixteen 2D maps for the file *hc256* are listed in the range of *n* 12, *k* {2, 3, 4, 7}, *N* ∼ 500, *T* 2700,*r* 1,mod*e* 1; (a) group (a1–a4) four Map*<sup>A</sup>* maps; (b) group (b1–b4) four Map*<sup>T</sup>* maps; (c) group (c1–c4) four Map*<sup>G</sup>* maps; (d) group (d1–d4) four Map*<sup>C</sup>* maps.

In Fig. 7, four groups of sixteen 2D maps for the file *right* are selected in the range of *n* 15, *k* {2, 3, 4, 7}, *N* ∼ 500, *T* 2700; (a) group (a1–a4) four Map*<sup>A</sup>* maps; (b) group (b1–b4) four Map*<sup>T</sup>* maps; (c) group (c1–c4) four Map*<sup>G</sup>* maps; (d) group (d1–d4) four Map*<sup>C</sup>* maps.

In Fig. 8, three groups of twelve 2D maps for the file *hc256* are compared in the range of *n* 12, *k* 7, *N* ∼ 500, *T* 2700,*r* {1, 2, 3}, mod*e* 1; (a) group (a1–a4) four Map*<sup>V</sup>* maps *r* 1; (b) group (b1–b4) four Map*<sup>V</sup>* maps *r* 2; (c) group (c1–c4) four Map*<sup>V</sup>* maps *r* 3.

In Fig. 9, three groups of twelve 2D maps for two files *right* and *hc256* are compared in the range of *k* 7, *N* ∼ 500, *T* 2700; (a) the file *right n* 15, mode0; (b) the file *hc256 n* 12, mode1, *r* 1; (c) the file *hc256 n* 12, mode1, *r* 3; (a1–c1) Map*<sup>A</sup>* maps; (a2–c2) Map*<sup>T</sup>* maps; (a3–c3) Map*<sup>G</sup>* maps; (a4–c4) Map*<sup>C</sup>* maps.

#### *4.4 Result Analysis of 2D Maps*

Six groups of 2D maps contain different information, it is necessary to make a brief discussion on their important issues as follows.

The first group of results shown in Fig. 4 presents three sets of eighteen 2D maps from three data files: *right,* 201–500 and *hc256* undertaken various lengths of basic segment from 3 to 50 to illustrate their variations respectively. Six 2D maps of each group in Fig. 4 (a1–a6) show significant trace on their visual distributions; the numbers of main visible clusters identified are decreased when the length of segment has being increased e.g. (a3–a6). However lesser length of segment does not pro-

**Fig. 4** Three groups of eighteen 2D maps in the range of *n*=*3*~*50, k*=*7, N* ∼*200*~*600, T*=*2700*; (a1–a6) Map*A* for the file *Right*; (b1–b6) Map*G* for the file201–500; (c1–c6) Map*A* for the file *hc256* mode 1,*r* 1

**Fig. 5** Four groups of sixteen 2D maps in the range of *n* 15, *k* {2, 3, 4, 7}, *N* ∼ 500, *T* 2700; **a** group (a1–a4) four Map*<sup>A</sup>* maps; **b** group (b1–b4) four Map*<sup>T</sup>* maps; **c** (c1–c4) four Map*<sup>G</sup>* maps; **d** (d1–d4) four Map*<sup>C</sup>* maps for the file *right*

vide refined visual distinctions with larger region in fuzzy areas e.g. (a1–a2). From a structural viewpoint, middle ranged numbers of length provide better clustering results e.g. (a3–a5) for further analysis targets. To check another six 2D maps of Fig. 4 (b1–b6) for the file 201–500, significantly different visual distributions can be observed than (a1–a6); the numbers of main visible clusters identified are decreased when the length of segment has being increased less significantly e.g. (b4–b6). However lesser length of segment does not provide refined visual distinctions with wider regions in fuzzy areas e.g. (b1–b3). In general, middle ranged numbers of length still provide better clustering effects e.g. (b4–b6) for further analysis purpose. To check six 2D maps of Fig. 4 (c1–c6) for the file *hc256 r*=*1*, similar visual distributions can be observed than (a1–a6) and significantly differences are observed than (b1–b6); the numbers of main visible clusters identified are decreased when the length of segment has being increased less significantly e.g. (c3–c6). However lesser length of segment does provide refined visual distinctions with regions in fuzzy areas e.g. (b1). In general, middle ranged numbers of length still provide better clustering effects

**Fig. 6** Four groups of sixteen 2D maps in the range of *n* 12, *k* {2, 3, 4, 7}, *N* ∼ 500, *T* 2700 for the file *hc256*, *r* 1, mode 1; **a** group (a1–a4) four Map*<sup>A</sup>* maps; **b** group (b1–b4) four Map*<sup>T</sup>* maps; **c** (c1–c4) four Map*<sup>G</sup>* maps; **d** (d1–d4) four Map*<sup>C</sup>* maps

e.g. (c2–c4) for further analysis purpose. From their distributions, groups (a) and (c) have shared much stronger similar properties than group (b).

It is interesting to observe different maps when control parameter k changed. Four groups of sixteen 2D maps for the file *right* are shown in Fig. 5 on the range of *n* 15, *k* {2, 3, 4, 7}, *N* ∼ 500, *T* 2700; four groups in (a)–(d) provide four maps to share the same other parameters with different k values. Checking visible clusters in different maps, it is important to notice nearly same numbers of clusters identified in the same group, but different groups may contain significantly different numbers. Lesser k value (e.g. *k* 2) makes a tighter distribution and larger k value (e.g. *k* 7) takes better separation on the maps. Through *k* 7 maps provide better separation effects, it is easy to observe their y axis values already in 10<sup>8</sup> range.

Four groups of sixteen 2D maps for the file *hc256* are shown in Fig. 6 in the range of *n* 12, *k* {2, 3, 4, 7}, *N* ∼ 500, *T* 2700,*r* 1. This group of 2D maps

**Fig. 7** Two groups of eight 2D maps in the range of *n* 15, *k* 7, *N* ∼ 200 ∼ 600, *T* 2700; **a** group (a1–a4) four Map*<sup>V</sup>* maps for the file *left*; (b) group (b1–b4) four Map*<sup>V</sup>* maps for the file *right*

**Fig. 8** Three groups of twelve 2D maps in the range of *n*=*12, k*=*7, N*=*500, T*=*2700* for the file *hc256*, *r*=*{1,2,3}, mode*=*1;* **a** group (a1–a4) four MapV maps *r*=*1*; **b** group (b1–b4) four MapV maps *r*=*2*; **c** group (c1–c4) four MapV maps *r*=*3*

can be compared with 2D maps in Fig. 5. Under the same parameters, similar visible effects and feature clustering properties could be observed if various k values are selected.

**Fig. 9** Three groups of twelve maps in the ranges: *N*=*500, T*=*2700, k*=*7*; **a** Real DNA data; (a1–4) DNA sequences from the file *right*; (**b**–**c**) Simulation data; (b1–4) Binary sequences from the file *hc256*, *r*=*1*; (c1–4) Binary sequences from the file *hc256*, *r*=*3*

Using a set of selected parameters, two groups of eight 2D maps are compared in Fig. 7 for two files: *left, right* to explore higher levels of symmetric properties for secondary or higher levels of structures potentially contained in DNA sequences. Selected parameters are in the range of *n* 15, *k* 7, *N* ∼ 500, *T* 2700. Group (a) provides four Map*<sup>V</sup>* maps (a1–a4) for the file *left*; group (b) uses four Map*<sup>V</sup>* maps (b1–b4) for the file *right*.

In convenient description, let~be a similar operator, for groups (a) and (b), four pairs of {(a1)~(b1), (a2)~(b2), (a3)~(b3), (a4)~(b4)} maps i.e. (*left*-A~*right*-A, *left*-T~*right*-T, *left*-G~*right*-G, *left*-C~*right*-C) have a stronger similar distribution between *left* & *right*. In addition, only two clustering classes could be significantly identified as {(a1)~(a2)~(b1)~(b2), (a3)~(a4)~(b3)~(b4)} i.e. (*left*-A~*right*-A~*left*-T~*right*-T, *left*-G~*right*-G~*left*-C~*right*-C) respectively. This type of similar clustering distributions may strongly indicate eight maps with intrinsically higher levels of DNA sequences with extra A–T and G–C pairs of symmetric relationships between two files: *left* & *right*.

Using a set of selected parameters, three groups of twelve 2D maps are listed in Fig. 8 for the file *hc256, r*=*{1,2,3}* to explore properties for their higher levels of structures potentially contained in pseudo DNA sequences. Selected parameters are in the range of *n* 12, *k* 7, *N* ∼ 500, *T* 2700. Group (a) provides four Map*<sup>V</sup>* maps (a1–a4) for *r*=*1*; group (b) uses four Map*<sup>V</sup>* maps (b1–b4) for *r*=*2* (c) uses four Map*<sup>V</sup>* maps (c1–c4) for *r*=*3*. Using a similar operator, for groups (a–c), four pairs of {(a1)~(b1)~(c1), (a2)~(b2)~(c2), (a3)~(b3)~(c3), (a4)~(b4)~(c4)} maps i.e. (A(*r*=*1*)~A(*r*=*2*)~A(*r*=*3*), *…,* C(*r*=*1*)~C(*r*=*2*)~C(*r*=*3*)) have a stronger similar distribution among *r*=*{1,2,3}*. In addition, only two clustering classes could be significantly identified as {(a1)~(a2)~(b1)~(b2)~(c1)~(c2), (a3)~(a4)~(b3)~(b4)~(c3~c4)} i.e. three maps are shown in (A~T, G~C) respectively.

In a convenient comparison, using a set of selected parameters, three groups of twelve 2D maps are compared in Fig. 9 for the files: *right* and *hc256, r*=*{1,3}* to check their distribution properties contained in both DNA and created pseudo DNA sequences. Group (a) provides four Map*<sup>V</sup>* maps (a1–a4) for the file *right*; groups (b) and (c) provide four Map*<sup>V</sup>* maps (b1–b4) for *hc256, r*=*1* (c) and (c1–c4) for *hc256, r*=*3*.

Using a weak similar operator , for groups (a–c), four pairs of {(a1)(b1)~(c1), (a2)(b2)~(c2), (a3)∼(b3)~(c3), (a4)∼(b4)~(c4)} maps have a stronger similar distribution between *r*=*{1,3}* and a weak similar distribution on A and T cases. In addition, only two clustering classes could be significantly identified as {(a1)~(a2)(b1)~(b2)~(c1)~(c2), (a3)~(a4)~(b3)~(b4)~(c3)~(c4)} i.e. three maps are strongly shown in relationships among (A~|T, G~C) for different cases respectively.

In addition, this set of results illustrates directly visual comparisons with stronger similarity between DNA and pseudo DNA on VMS maps, their similarly clustering distributions may indicate those maps with comparable mechanism to express real DNA sequences with extra A–T and G–C pairs of symmetric relationships in their higher levels of relationships applying the Stream Cipher mechanism.

## **5 Conclusion**

This chapter proposes architecture to support the Variant Map System. Using a binary random sequence as input, a set of special pseudo DNA sequences can be generated. Under variant measures, probability measurement and normalized histogram, a pair of values can be determined by a series of controlled parameters. Collecting relevant pairs on multiple DNA sequences, four 2D maps can be generated.

The main results of this chapter provide the VMS architecture description in diagrams, main components, modules, expressions and important equations for the VMS. Core models and diagrams, sample results are illustrated to apply two types of data sets selected from real DNA sequences and generated from the pseudo random sequences from the Stream Cipher HC-256 for comparison under the VMS testing. After proper set of parameters selected, suitable visual distributions could be observed using the VMS. Results in Figs. 4, 5, 6, 7, 8 and 9 provide useful evidences systematically to support proposed VMS useful in checking higher levels of symmetric/similar properties among complex DNA sequences in both natural and artificial environment.

This construction could provide useful insights to spatial information on complex DNA expressions especially on large encoding RNA/DNA construction via 2D maps to explore higher levels of complex interactive environments in near future.

**Acknowledgements** Thanks to the school of software Yunnan University, to the key laboratory of Yunnan software engineering and the key laboratory for Conservation and Utilization of Bioresource for excellent working environment, to the Yunnan Advanced Overseas Scholar Project (W8110305), the Key R&D project of Yunnan Higher Education Bureau (K1059178) and National Science Foundation of China (61362014) for financial supports to this project. This work was supported by the Key Project on Electric Information and Next Generation IT Technology of Yunnan (2018ZI002), NSF of China (61362014) and Yunnan Advanced Overseas Scholar Project.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Whole DNA Sequences of** *Cebus capucinus* **on Variant Maps**

**Yuyuan Mao, Jeffrey Zheng and Wenjia Liu**

**Abstract** DNA sequences as a big data stream have been researched for years. However, researches on whole DNA sequences have various limitations to use existing research methods. A new scheme is proposed to map whole DNA sequences as 2D maps in this chapter, the whole DNA sequence of Capuchin monkey (*Cebus capucinus*) in apes was used as an example to demonstrate the mapping results.

**Keywords** Gene sequence · *Cebus capucinus* · Mapping method Sequential model · Variant map

## **1 Introduction**

In modern biologics, DNA sequences are being sequenced from wider species from human to simple cells in DNA data banks as big data streams. It is difficult to process various DNA streams for classification and identification on various species from whole sequences. The main task of present genomic research [1, 2] is to obtain

Y. Mao

J. Zheng (B)

J. Zheng

Key Laboratory of Software Engineering of Yunnan, Yunnan University, Kunming, China

W. Liu Yunnan University, Kunming, China e-mail: 8avalon8@gmail.com

This work was supported by the Key Project on Electric Information and Next Generation IT Technology of Yunnan (2018ZI002), NSF of China (61362014) and Yunnan Advanced Overseas Scholar Project.

School of Software, Yunnan University, Kunming, China e-mail: m805792943@foxmail.com

Key Laboratory of Quantum Information of Yunnan, Yunnan University, Kunming, China e-mail: conjugatelogic@yahoo.com

more biological information by processing and analyzing of the DNA sequence from multi-angles and multilevels [4–7]. In recent years, the processing and utilization of biological gene data are being carried out in a variety of ways, such as gene feature extraction, gene sequence location [7–9], and so on.

Variant map is an emerging technology to handle four symbols as meta-structure to process random sequences from cryptographic sequences, DNA sequences [3, 10] to ECG signals. Multiple statistical probability distributions are generated from selected sequences to form 2D–3D visual maps in representation. This scheme makes whole data sequences more compact and effectively visualized, and mapping results may be useful to explore nonlinear complex behaviors of whole genomics. A whole DNA sequence of a night monkey has mapped [11] on variant maps.

In this chapter, a special scheme is proposed to show a series of mapping results from a selected gene sequence of a capuchin monkey.

#### **2 Process Model**

#### A. *Architecture*

The architecture of the process model is shown in Fig. 1a. The process model consists of five parts: input, processing, measurement, projection, and output. There are three modules: Processing, Measurement, and Projection.

Input: A DNA sequence

Output: A 2D map

Modules: Processing, Measurement, and Projection

Process: From a selected DNA sequence, multiple segments are divided by a fixed length m on the whole sequence sequentially in the Processing module. Each segment needs to count four symbols: {*A*, *C*, *G*, *T*} in the segment to transfer all segments into a measuring sequence of four measures in Measurement module. A special combination on *X*: {*AT*} and *Y*: {*AG*} is selected to determine four measures in a projection position and the whole measuring sequence projected to be a 2D map in Projection module.

B. *Processing Module*

From an input DNA sequence, multiple segments can be separated by a fixed length m to generate a sequence of segments.

Input: a DNA sequence Output: a sequence of segments

#### C. *Measurement Module*

In this module, shown in Fig. 1b, each segment counts four numbers of {*A*, *G*, *C*, *T*} in each proportions, respectively. As the result, each count is an integer number between 0 and *m* to transfer a segment sequence into a measuring sequence of four measures.

Input: a sequence of segments Output: a sequence of four measures

#### D. *Projection Module*

The projection module is shown in Fig. 1c as two units: Position and Projecting. For each four measures, two axis positions are determined by *X*(*AT*) and *Y*(*AG*), respectively. When all measures are processed, a 2D histogram is established as a statistical distribution as a 2D map.

Input: a sequence of four measures Output: a 2D map

**Fig. 1** Architecture of mapping scheme (**a**)–(**c**). **a** Architecture; **b** Measurement module; **c** Projection module

## **3 Details**

#### A. *Relevant Parameters*

*m*: segment length

*V*: Two bases of combination: {*AT*, *AG*}

num*(AT )* num*(A)* + num*(T )*;

num*(AG)* num*(A)* + num*(G)*;

> *P<sup>v</sup>* num*(V)*

*Pv*: The proportion of a base or combinatorial base (*XPAT , YPAG* ): a pair of *XY* mapping positions.

B. *Parameter in Module*

Since the output quality of generating maps is dependent on the number of projection points, it is necessary for a refined map to include a larger number of coordinate points. The mapping projection forms the superposition to add up a larger number of coordinate points in 2D histogram representing a color map.

C. Measurement module.


Calculating the proportion of *AT* and *AG* in the subsection according to the basic rules of mathematics. Two proportions can form a coordinate - *Xi PAT , <sup>Y</sup> <sup>j</sup> PAG* , which map a point on the two-dimensional graph.

The mapping relation between *x* and *y*:

$$X:P\_{AT}$$

*Y* : *PAG*

It is necessary for a distinct graph that includes a large number of coordinate points. Only a large number of DNA sequences can get a large number of coordinates points and pretty projection results. The graphics projection module completes the superposition of a large number of coordinate points.

#### **4 Results Display**

#### *4.1 Maps on Various Segmented Length*

Different parameters are shown in Fig. 2a–l for *m* - {20, 30, 40, 50, 60, 70, 80, 90, 100, 120, 150, 200}, Fig. 3a–f for *m* = {54, 56, 58, 60, 62, 64}, Fig. 4a–d for *m* - {59, 60, 61, 62} and Fig. 5 for *m* -60, respectively.

In the map, similar color of pixels indicates the similar number of segments in the cluster.

#### *4.2 Brief Analysis*

From Fig. 2, it is interesting to notice that when *m* <50, maps have more symmetric properties than larger numbers. Changing segmented lengths, significant patterns appear in *m* -54–64 region shown in Fig. 3 and refined lengths are shown in Fig. 4.

From a visual observation, when *m* -60, the map has shown the better effects.

#### **5 Conclusion**

Using the proposed mapping scheme, it is feasible to transfer a whole DNA sequence as a color map with significant visual features. In addition to mapping method and selected functions, a set of sample sequences in various segmented lengths illustrate colorful distributions as variant maps.

Checking symmetric information among different maps, it is possible to identify specific spatial features under different configurations.

**Fig. 2** Variant maps of *Cebus capucinus* on various segmented lengths (**a**)–(**l**) *m* - {20, 30, 40, 50, 60, 70, 80, 90, 100, 120, 150, 200}. **a** *m* - 20; **b** *m* - 30; **c** *m* - 40; **d** *m* - 50; **e** *m* - 60; **f** *m* - 70; **g** *m* - 80; **h** *m* - 90; **i** *m* - 100; **j** *m* - 120; **k** *m* - 250; **l** *m* -200

**Fig. 3** Variant maps of *Cebus capucinus* on various segmented lengths (**a**)–(**f**) *m* - {54, 56, 58, 60, 62, 64}; **a** *m* - 54; **b** *m* - 56; **c** *m* - 58; **d** *m* - 60; **e** *m* - 62; **f** *m* -64

**Fig. 4** Variant maps of *Cebus capucinus* on various segmented lengths (**a**)–(**d**) *m* - {59, 60, 61, 62}. **a** *m* - 59; **b** *m* - 60; **c** *m* - 61; **d** *m* -62

**Fig. 5** Variant maps of *Cebus capucinus* on segmented lengths *m* -60

Since this is an initial step to make a whole DNA sequence in mapping operation, further researches and explorations are required.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Part IX Applications—Multiple Valued Sequences

Experience without theory is blind, but theory without experience is mere intellectual play. —Immanuel Kant

Make everything as simple as possible, but not simpler.

—Albert Einstein

Science cannot progress without reliable and accurate measurement of what it is you are trying to study. The key is measurement, simple as that.

—Robert D. Hare

Processing multiple valued sequences, it is necessary to use more complex structures in transformation. Various signals such as ECG, EEG, and BEC (Bat Echolocation Calls) were tested. From 2016, various papers were published on ECG processing. For example, Variant Maps on Normal and Abnormal ECG Data Sequences, Biol Med (Aligarh) 8:336. https://doi.org/10.4172/0974-8369.1000336; Mapping ECG Signals on Variant Maps, https://doi.org/10.1145/3110025.3110134; Visualization of P wave characteristics in ECG, https://doi.org/10.1109/CISP-BMEI.2017.8302247.

This part of multiple valued sequences is composed of two chapters (25 and 26).

Chapter "Successful Creation of Regular Patterns in Variant Maps from Bat Echolocation Calls" processes BEC signals on variant maps to identify variant maps into two distinct groups.

Chapter "Visual Analysis of ECG Sequences on Variant Maps" uses visual analysis of ECG sequences on variant maps; various normal and abnormal ECG sequences are selected in comparison. Significant characteristics of various distributions are observed.

# **Successful Creation of Regular Patterns in Variant Maps from Bat Echolocation Calls**

**D. M. Heim, O. Heim, P. A. Zeng and Jeffrey Zheng**

**Abstract** We created variant maps based on bat echolocation call recordings and outline here the transformation process and describe the resulting visual features. The maps show regular patterns while characteristic features change when bat call recording properties change. By focusing on specific visual features, we found a set of projection parameters which allowed us to classify the variant maps into two distinct groups. These results are promising indicators that variant maps can be used as basis for new echolocation call classification algorithms.

**Keywords** Echolocation · Algorithms · Morphometry · Fourier · Analysis Quaternions

D. M. Heim

O. Heim Animal Ecology, Institute of Biochemistry and Biology, University of Potsdam, 14469 Potsdam, Germany e-mail: bats@o-heim.de

P. A. Zeng Yunnan University, Kunming, China e-mail: 895158562@qq.com

J. Zheng (B) Key Laboratory of Yunnan Software Engineering, Yunnan University, Yunnan 650091, Kunming, China e-mail: conjugatelogic@yahoo.com

This work was supported by NSF of China (61362014), Yunnan Advanced Overseas Scholar Project and the Key Project on Electric Information and Next Generation IT Technology of Yunnan (2018ZI002).

Key Laboratory of Quantum Information of Yunnan, Yunnan University, Kunming, China e-mail: dennis.heim@gmx.net

O. Heim Leibniz Institute for Zoo and Wildlife Research, 10315 Berlin, Germany

#### **1 Introduction**

The identification of echolocation calls is essential to the research and conservation of bat species [1]. However, automatic classification algorithms have not yet been proven capable of providing 100% correct classifications or getting close enough to this ideal performance [2]. Since our approach of using variant maps [3] shows already promising results, we are confident that it will continue adding valuable contributions to the field of automatic bat call identification.

Automated bat echolocation call identification algorithms were developed since the late 1990s [4–7]. At that time, multivariate discriminant function analysis or neural networks were used for the classification of the calls. Since then, other methods have been applied, e.g., algorithms of pattern recognition [8], support vector machines [9], hierarchical ensembles of neural networks [9, 10], geometric morphometry [11], machine learning [12], CART [13], and random forest classification [14]. For a critical analysis of the performance of the applied methods, we refer to [2] and the references therein.

Using variant maps for the classification of bat echolocation calls differ completely from these conventional techniques. The main difference is the preprocessing step, where the recordings are transformed into variant maps. This step offers the possibility to analyze the bat call recordings from a completely different point of view. It provides additional degrees of freedom which allow a further optimization of the identification process, e.g., by supplementing the information obtained from a Fourier analysis of the bat calls.

Our method to transform the bat call recordings is based on measures proposed by Zheng [15] in the 1990s to partition special phase spaces in binary image analysis. These methods were extended in the 2010s [3, 16] and successfully used to classify quantum interactions [17, 18], differently encrypted messages [19], and noncoding DNA [20, 21].

Similar to these works, we transform the bat call recordings using variant measures to obtain variant maps. Each recording contains several calls of one bat species. We used calls of four aerial-hawking bat species in this study. Recordings were made at three types of crop fields far away from woody vegetation. The created variant maps have a regular structure, but characteristic features vary strongly with each recording. These results show that variant maps can be used to extract usable information from bat echolocation recordings.

#### **2 Transformation**

The processed bat echolocation calls were recorded with a sampling rate of 500 kHz and saved as "raw" 16-bit audio files. In the following, we describe in four steps (A–D) how we transformed these files into variant maps.

**Step A**: From analogue to digital audio

In a recording of data length *N*, the amplitude of the bat echolocation calls is stored in *N* samples. Each sample corresponds to a floating-point number of 16 bits. For simplicity, we transformed the floating-point numbers to integer numbers of 16 bits.

**Step B**: From digital audio to quaternions Next, we transform the integer sequence into a sequence of four metastates {⊥, +, −, } which resemble the quaternions {Bottom, Plus, Minus, Top}. For this step, we select the *i*-th sample *Ai* and its next neighbor *Ai*+<sup>1</sup> and define the difference Δ*A* = *Ai*+<sup>1</sup> − *Ai* and local average *L* = (*Ai* + *Ai*+<sup>1</sup>)/2. Additionally, we require the maximum *A*max and minimum *A*min of the current sequence to define a middle value *V* = (*A*min + *A*max)/2 and we define a tolerance *T* . Using these values, we transform the integer sequence *A*<sup>1</sup> ··· *AN* into a sequence of quaternions *B*<sup>1</sup> ··· *BN* using the rules

$$\begin{aligned} \text{if} \quad \Delta A &< T \quad \text{and} \quad L > V: \quad B\_i = \top\\ \text{if} \quad \Delta A &< T \quad \text{and} \quad L \le V: \quad B\_i = \bot\\ \text{if} \quad \Delta A &\ge T \quad \text{and} \quad A\_i > A\_{i+1}: B\_i = -\\ \text{if} \quad \Delta A &\ge T \quad \text{and} \quad A\_i < A\_{i+1}: B\_i = + \bot \end{aligned}$$

As an example, the values *T* = 4 and *V* = 10 lead to the sequence


**Step C**: From quaternions to meta-measures

We subdivide the quaternion sequence into segments of length *M* and obtain, in this way, *S* = *N*/*M* segments. For each segment, we define four meta-measures {*M*⊥, *M*+, *M*−, *M*}. One measure represents the number of associated quaternions in one segment. These meta-measures satisfy the relations 0 ≤ *M*⊥, *M*+, *M*−, *M* ≤ *M* and *M*<sup>⊥</sup> + *M*<sup>+</sup> + *M*<sup>−</sup> + *M* = *M*. The quaternion sequence with *N* units is now represented by *S* segments where each segment contains four meta-measures.

**Step D**: From meta-measures to variant maps

There are many possibilities to combine meta-measures for the creation of variant maps [3, 15–21]. To transform the bat echolocation calls into 2D color maps, we defined for each segment of meta-measures the axis values *X* = *M*<sup>+</sup> + *M*<sup>⊥</sup> and *Y* = *M*<sup>⊥</sup> + *M*<sup>−</sup> + *M*. One *Z* value is obtained by counting the number of segments where one specific *X*–*Y* combination was found. Each *Z* value is represented by a color in an (*M* + 1) × (*M* + 1) matrix.

As an example, we depicted in Fig. 1 the variant map of an echolocation call recording from the bat species *Nyctalus noctula*. It has a data length *N* = 967,139 and we chose a segment length *M* = 237. At the position *X* = 80 and *Y* = 200 marked by a white circle, the color indicates a value *Z* = 10. That is, we found 10 segments where the conditions *M*<sup>+</sup> + *M*<sup>⊥</sup> = 80 and *M*<sup>⊥</sup> + *M*<sup>−</sup> + *M* = 200 apply.

**Fig. 1** The variant map of an echolocation call recording from the species *Nyctalus noctula* created by following the processing steps A–D described in Sect. 2. We highlighted the position *X* = 80 and *Y* = 200 by a white circle to illustrate the processing step D. At this position, the conditions *M*<sup>+</sup> + *M*<sup>⊥</sup> = 80 and *M*<sup>⊥</sup> + *M*<sup>−</sup> + *M* = 200 apply. Further visual features are discussed in Sect. 3 in more detail

White areas indicate regions without any projection point on this sequence. For a discussion of further visual features which appear in this figure we refer to Sect. 3.

These types of maps offer the possibility to visualize long data sequences with >10<sup>6</sup> samples on compact matrices. We use this scheme to transform each bat call recording into a 2D color figure. It can be optimized for the identification of bat species, recording locations or times.

#### **3 Variant Maps**

Our main result is that all variant maps created from bat echolocation calls show regular patterns while characteristic visual features vary with each recording. In the following, we describe the data we processed in detail and discuss the visual features we observed.

#### *3.1 Data Description*

We processed 44 files which were recorded in August 2012 in the Uckermark region (Brandenburg, Germany) [22]. Each recording contains only calls of one of the four European bat species*Nyctalus noctula*, *Pipistrellus nathusii*, *Pipistrellus pipistrellus*, or *Pipistrellus pygmaeus*. These files were recorded on arable fields cultivated with three different crop types: corn (C), rapeseed (R), or wheat (W). The record length varies between 30 s and 2 min.

#### *3.2 Visual Features*

We transformed all 44 files of bat calls into variant maps by steps A to D described in Sect. 2. That is, we used the axis values *X* = *M*<sup>+</sup> + *M*<sup>⊥</sup> and *Y* = *M*<sup>⊥</sup> + *M*<sup>−</sup> + *M* and a segment length *M* = 237. By focusing on the visual features, we clustered the resulting maps into two groups. A typical member of each group is shown in Fig. 2.

One group consists only of maps showing patterns which have two significant maxima with values >105. We call members of this group **double-maxima** maps. The example shown in Fig. 2a has maxima at the positions *X* = 0, *Y* = 237 and *X* = 120, *Y* = 200. Besides these two maxima, there are distinct positions on diagonal areas with values of the orders 1–103.

All other maps belong to the group of **non-double-maxima** maps. As an example, the map in Fig. 2b has its significant maximum at the position *X* = 0, *Y* = 237 while other projection regions have values of the orders 1–103. In addition, most values of interest are located around a diagonal region and form a slat band on the map.

All 44 resulting maps are shown in Figs. 3 and 4. They are separated into **doublemaxima** maps (Fig. 3) and **non-double-maxima** maps (Fig. 4). In principle, it is possible to further subdivide the variant maps by identifying additional visual features. However, since we did not yet find a direct connection between visual features and bat call properties, a further subdivision goes beyond the scope of this manuscript and will be the topic of a future publication.

#### *3.3 Discussion*

On all generated maps, the positions on the left-down triangle area are empty. This is because our choice of axis obeys *X* + *Y* ≥ *M*. Empty positions in the right-upper area appear because the bat call recordings consist of discrete short pulses with a longer time period of silence in between.

Similarly, other visual characteristics in the colored areas can be directly related to properties of the bat call recordings. As an example, a signal of constant frequency can be transformed into a single position on a variant map by choosing suitable

**Fig. 2** Variant maps of **a** *Pipistrellus nathusii* and **b** *Nyctalus noctula*, both recorded on a rapeseed field. The figures were created by applying the transformation process described in Sect. 2. **a** which shows a typical **double-maxima** map with two significant maxima, while **b** belongs to the group of **non-double-maxima** maps

**Fig. 3** These variant maps show **double-maxima** patterns. They have two significant maxima with values >105. The axis ranges are the same as in Fig. 2. Each map origins from a bat echolocation recording on a corn (C), rapeseed (R), or wheat (W) field

parameters. This means that by optimizing the variant map transformation, it is possible to focus on features of the initial bat echolocation call for the creation of variant maps.

This is the first time to our knowledge that quaternion structures have been used to transform bat calls. Our transformation process could be used to add optimizing parameters to current bat call identification schemes and in this way form the basis for a new identification algorithm.

**Fig. 4** These variant maps show **non-double-maxima** patterns. That is, they explicitly do not have two distinct maxima with values >10<sup>5</sup> in contrast to the **double-maxima** maps shown in Fig. 3

#### **4 Summary and Outlook**

We transformed 44 bat echolocation files into variant maps. All created variant maps have a similar structure and can be classified by focusing on specific visual features. As an example, we found a set of projection parameters which allowed us to classify the recordings into **double-maxima** and **non-double-maxima** maps.

Features like this can be traced back to the signal nature of the recordings. In this way, variant maps offer the possibility to focus on individual features of bat echolocation calls. Since there are multiple numbers of possible combinations to create variant maps, we are very positive that a suitable projection combination can be found to fulfill our ultimate goal of identifying single bat species.

In order to meet this target, it is necessary to process a much higher number of bat calls to create a sufficiently large database for the effective determination of possible projections and associated maps. This would form the perfect basis for the development of a new echolocation call identification algorithm.

**Acknowledgements** We thank C. C. Voigt for providing the processed bat echolocation data and S. A. Troxell for revising the manuscript. Financial support by the National Science Foundation of China NSFC (No. 61362014) and the Overseas Higher-level Scholar Project of Yunnan Province, China, is gratefully acknowledged. Moreover, we appreciate the financial support by the Federal Ministry of Science, Research and Culture in Brandenburg, the University of Potsdam, the Leibniz Institute for Zoo and Wildlife Research and the Deutsche Forschungsgemeinschaft (Vo 890/22).

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Visual Analysis of ECG Sequences on Variant Maps**

**Zhihui Hou and Jeffery Zheng**

**Abstract** This chapter presents the variant measurement based on the variant logic, which uses the ECG sequence as the signal source, and outputs the variant maps of ECG sequences. It provides a supplementary study for ECG detection. Samples of ECG signal are collected from the First People's Hospital of Yunnan Province. Under variant maps, main parameters of various interval values are checked and corresponding maps are illustrated.

**Keywords** Arrhythmia · Visualization · ECG sequences · Variant map

## **1 Introduction**

The world is concerned about the cardiovascular disease [1]. Mainly relying on the detection of ECG signals to promote research on related issues of cardiovascular diseases. The electrocardiogram represents cardiac function and graphic signals [2], which is an important means of diagnosing abnormal cardiac activity.

ECG signals are the product of a wide range of clinical ECG techniques. In recent years, research methods for ECG signals have made significant progress, such as using machine learning [3], neural network, clustering [4], partial fractal dimension [5], wavelet transform [6], and other methods to classify the detection of arrhythmia. The most typical representative of the emerging ECG research method is ECG scatter gram [7–9].

Z. Hou

Yunnan University, Kunming, China e-mail: 1660919714@qq.com

J. Zheng (B)

```
© The Author(s) 2019
```
401

Project supported by the Key Project on Electric Information and Next Generation IT Technology of Yunnan (2018ZI002), NSF of China (61362014), Yunnan Advanced Overseas Scholar Project.

Key Laboratory of Yunnan Software Engineering, Yunnan University, Kunming, China e-mail: conjugatelogic@yahoo.com

J. Zheng (ed.), *Variant Construction from Theoretical Foundation to Applications*, https://doi.org/10.1007/978-981-13-2282-2\_26

**Fig. 1** The overall structure of the variant map for ECG

The variant method is an emerging technique for dealing with spatial changes in signal phase. Since the 1990s, the application of the variant method in processing binary image classification and transformation [10, 11] had been proposed, and the variant method has been perfected until now [12, 13]. Variant method is applied to different data samples: quantum sequences [14, 15], random sequences [16], noncoding DNA [17–19], bat echo signals [20], and electrocardiographic signals [21, 22], and effective research results have been obtained in these samples.

This chapter is a further study of the use of variant measurements in the detection of ECG sequences. The sample ECG signals are provided by the First People's Hospital of Yunnan Province. In this chapter, two groups of signals are used: normal ECG signal and abnormal ECG signal groups. In the second part of this chapter, we describe variant map for ECG. Showing sample results and making a brief analysis in the third part, the last part is the summary of the chapter.

#### **2 Variant Map for ECG**

Variant map for ECG is composed of six parts: Input, Processing, Segmenting, Statistics, Mapping, and Output. Figure 1 is the overall structure of the variant map for ECG, which specific content about each part in the following description:

#### A. *Input Part*

Testing ECG signals are provided by the hospital as a data source. Let ECG signals be *p* with *N* elements.

$$p = \left\{ p\_0, \ldots, p\_{N-1} \right\}$$

#### B. *Processing Part*

In processing part, a multivalve ECG signal sequence will be transformed into a four-valued pseudo-DNA sequence.

$$\text{Input: the ECG sequence}$$

$$p = \left\{ p\_0, \ldots, p\_{N-1} \right\}$$

Parameters: *W* sliding window value; *R* interval value. Output: a four-valued pseudo-DNA sequence

Visual Analysis of ECG Sequences on Variant Maps 403

$$q = \{q\_0, \ldots, q\_{N-1}\}$$

Processing:

Let *pi* be an average value; *r* be a range value; *ti* be a conversion value. Three values are calculated in the equations:

$$\bar{p}\_i = \sum\_{i=0}^{N-1} \frac{p\_i}{W}$$

$$p\_{\text{max}} = \max\{p\_i\}, 0 \le i < N - 1$$

$$p\_{\min} = \min\{p\_i\}, 0 \le i < N - 1$$

$$r = (p\_{\text{max}} - p\_{\text{min}}) \* \frac{R}{2}$$

$$t\_i = \frac{2(p\_i - \bar{p}\_i)}{r \ast R}$$

Transforming rules: 0 ≤ *i < N* − 1

$$\text{if } \ t\_i > R > 0 : q\_i = A; \text{ if } \ 0 < t\_i < R : q\_i = G;$$

$$\text{if } \ 0 > t\_i \ \succ -R : q\_i = C; \text{ if } 0 \succ -R \succ t\_i \ \vdots \neq q\_i = T;$$

C. *Segmenting Part* Input: *q* - {*q*0*,..., qN*−<sup>1</sup>}. Parameters: *m* is a segment value. Output: *Q* - - *Q*0*,..., Q <sup>j</sup>,..., Q <sup>M</sup>*−<sup>1</sup> , 0 ≤ *j < M; M* is segments and *N=m\* M.* Processing: the *j*-th element in *Q* - - *Q*0*,..., Q <sup>j</sup>,..., Q <sup>M</sup>*−<sup>1</sup> ;

$$\mathcal{Q}\_j = \{q\_{j\ast m}, \dots, q\_{j\ast m+i}, \dots, q\_{j\ast m+m-1}\}, \ 0 \le i < m, 0 \le j < M.$$

D. *Statistics Part* Input: *Q* - - *Q*0*,..., Q <sup>j</sup>,..., Q <sup>M</sup>*−<sup>1</sup> , 0 ≤ *j < M* Output: *S* - *S A <sup>j</sup> , S<sup>C</sup> <sup>j</sup> , S<sup>G</sup> <sup>j</sup> , S<sup>T</sup> j* , 0 ≤ *j < M*

> *S A <sup>j</sup>* is value of the number of *A* element in *Q <sup>j</sup>*

*SC <sup>j</sup>* is value of the number of*C* element in *Q <sup>j</sup> SG <sup>j</sup>* is value of the number of *G* element in *Q <sup>j</sup> ST <sup>j</sup>* is value of the number of *T* element in *Q <sup>j</sup>*

#### E. *Mapping Part*

Selecting a pair of two elements in *S* - *S A <sup>j</sup> , S<sup>C</sup> <sup>j</sup> , S<sup>G</sup> <sup>j</sup> , S<sup>T</sup> j ,* 0 ≤ *j < M*, as a mapping object. This chapter selects *SC <sup>j</sup> , S<sup>G</sup> j* . *S<sup>C</sup> <sup>j</sup>* is corresponding to the *X*-axis and *S<sup>G</sup> <sup>j</sup>* is corresponding to the *Y*-axis. All *M* pairs are mapping to the 2D map as output.

#### F. *Output*

The results of the mapping are output in the form of 2D variant maps.

#### **3 Sample Results and Brief Analysis**

Visualization results of ECG signal obtained by variant map for ECG show that the morphological features of ECG signals have regular changes. Sample results are illustrated and a brief analysis is described.

#### A. *Data Source Description*

The ECG signals in this chapter are provided by the First People Hospital of Yunnan Province. The ECG signals contain a total of 202,626 cases. There are 104,742 normal cases and 97,884 abnormal cases of records. For this experiment, 97,884 normal cases and 97,884 abnormal cases were selected.

Since ECG signals have multiple attributes, this chapter chooses the attributes of the P wave samples to be processed. Figure 2 is the sample of part of abnormal ECG data source.

#### B. *Visualization Features*

Using the variant map for ECG, multiple maps can be generated.

The interesting finding is that the changes of the parameters affect the spatial characteristics and phase changes of the maps.

Overall in Fig. 3, two 2D maps are illustrated for two normal/abnormal maps, parameters are *W* - 24, *R* - 0*.*95, *m* - 50. *X* and *Y* are *SC <sup>j</sup> , S<sup>G</sup> j* 0 ≤ *j < M*, the ECG variant map shows the regular characteristics. In Fig. 3a, a normal map for *P* wave is an oval. In Fig. 3b, an abnormal map for *P* wave is a stick.

In Fig. 4, a list of normal maps for *P* wave on parameters *R* - {0*.*6*,* 0*.*72*,* 0*.*84*,* 0*.*96*,* 65*,* 1*.*08*,* 1*.*2}. When the parameter *R* increases, the feature of relevant maps has a nonlinear displacement along the top right corner of the image.


**Fig. 2** The sample of part of abnormal ECG data source

**Fig. 3** The example of normal and not ECG variant map

In Fig. 5, a list of abnormal maps for *P* wave on parameters *R* - {0*.*6*,* 0*.*72*,* 0*.*84*,* 0*.*96*,* 65*,* 1*.*08*,* 1*.*2}. When the parameter *R* increases, the feature of relevant maps has a nonlinear displacement along the top right corner of the image.

Comparing with Figs. 4 and 5, differences between normal and abnormal map features.

#### **4 Summary and Prospect**

Electrocardiogram (ECG) detection is the key to clinical diagnosis of heart disease and has important clinical value. At present, the automatic analysis function of dynamic ECG detection is not satisfactory. There are also problems that the features of waveform lesions are small and cannot be marked, and even the characteristics of lesions are neglected. Therefore, excavating the effective information existing in the massive ECG signal can avoid the blind area of ECG analysis to some extent, which has certain application value.

**Fig. 4** A list of normal maps for *P* wave on parameters *R* - {0*.*6*,* 0*.*72*,* 0*.*84*,* 0*.*96*,* 65*,* 1*.*08*,* 1*.*2}; **a**–**f** maps on *R* -{0*.*6*,* 0*.*72*,* 0*.*84*,* 0*.*96*,* 65*,* 1*.*08*,* 1*.*2}

This chapter presents a new scheme of statistical distribution, variant map for ECG. This method can process massive ECG data sequences as 2D maps with visual characteristics. The sample results show classification of arrhythmia characteristics

**Fig. 5** A list of abnormal maps for*P* wave on parameters *R* - {0*.*6*,* 0*.*72*,* 0*.*84*,* 0*.*96*,* 65*,* 1*.*08*,* 1*.*2}; **a**–**f** maps on *R* -{0*.*6*,* 0*.*72*,* 0*.*84*,* 0*.*96*,* 65*,* 1*.*08*,* 1*.*2}

to identify the normal ECG signals and abnormal ECG signals significantly different. Further explorations and more experiments are required.

**Acknowledgements** Thanks to the First People's Hospital of Yunnan Province for ECG data sequences, to National Science Foundation of China NSFC (No. 61362014) and the Overseas Higher level Scholar Project of Yunnan Province, China (No. W8110305) for financial support to the project.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.